Author

Tony Duan

Probability is the branch of mathematics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an event is to occur.

1 Random number

1.1 draw 10 number from 1 to 10

Code
a=sample(1:10,10, replace=T) 
a
 [1]  9  7  1  6  8  8  9  8 10  1

each number around 10%

Code
as.data.frame(table(a))
   a Freq
1  1    2
2  6    1
3  7    1
4  8    3
5  9    2
6 10    1

1.2 draw 10,000 number from 1 to 10

Code
a=sample(1:10,10000, replace=T) 

each number around 10%

Code
as.data.frame(table(a))
    a Freq
1   1  986
2   2 1027
3   3 1030
4   4  975
5   5 1023
6   6  967
7   7  943
8   8  998
9   9 1052
10 10  999

2 Permutations and Combinations

2.1 Permutations(order dose matter), 2 number from 4 number

Code
library(gtools)
all_num=4
choose=2

res<- permutations(n= all_num, r = choose, v = c(1:all_num))
res
      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    2    1
 [5,]    2    3
 [6,]    2    4
 [7,]    3    1
 [8,]    3    2
 [9,]    3    4
[10,]    4    1
[11,]    4    2
[12,]    4    3
Code
print (nrow(res))
[1] 12

all_num!/choose!

4!/2!

Code
(4*3*2*1)/(2*1)
[1] 12

or

Code
factorial(4) / factorial(2)
[1] 12

2.2 Combinations(order no matter), 2 number from 4 number

Code
library(gtools)
all_num=4
choose=2

res<- combinations(n= all_num, r = choose, v = c(1:all_num))
res
     [,1] [,2]
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    2    3
[5,]    2    4
[6,]    3    4
Code
print (nrow(res))
[1] 6

all_num!/((all_num-choose)! * choose!

4!/((4-2)! * 2!)

Code
(4*3*2*1)/((2*1)*(2*1))
[1] 6

or

Code
factorial(4) / (factorial(4-2)*factorial(2))
[1] 6

3 conditional probability

each one snoring probability is 20%,4 people in one room.

In one room,at least one snoring probability is ?

Code
p=0.2
n=4

3.1 soluition 1:P(at least one)=P(1 snoring)+P(2 snoring)+P(3 snoring)+P(4 snoring)

3.1.1 0 snoring

Code
p0=(0.8*0.8*0.8*0.8)
p0
[1] 0.4096

3.1.2 1 snoring

Code
p1=(0.2*0.8*0.8*0.8)*4
p1
[1] 0.4096

3.1.3 2 snoring

choose 2 from 4: factorial(4) / (factorial(4-2)*factorial(2))

total 6 Permutations

Code
factorial(4) / (factorial(4-2)*factorial(2))
[1] 6
Code
p2=(0.2*0.2*0.8*0.8)*6
p2
[1] 0.1536

3.1.4 3 snoring

choose 3 from 4 Combinations(order matter),factorial(4) / (factorial(4-3)*factorial(3))

4 Combinations:

Code
factorial(4) / (factorial(4-3)*factorial(3))
[1] 4
Code
p3=(0.2*0.2*0.2*0.8)*4
p3
[1] 0.0256

3.1.5 4 snoring

Code
p4=(0.2*0.2*0.2*0.2)
p4
[1] 0.0016

3.1.6 at least one:

Code
P_at_least_one=p1+p2+p3+p4
P_at_least_one
[1] 0.5904

3.2 solution 2:P(at least one)=1-P(no one snoring)

Code
P_at_least_one2=1-0.8*0.8*0.8*0.8
P_at_least_one2
[1] 0.5904

4 Derangement problem

4.1 Question 1.what is probability of choose 4 number from 4 number and 0 correct(all wrong).

4.1.1 permutations(order matter)

4!=432*1=24 combination

Code
4*3*2*1
[1] 24

4.1.2 Derangement

or

There are 9 Derangement(all wrong)

Code
e=2.71828
#D(4)=(4!+1)/e

D_4=floor((4*3*2*1+1)/e)
D_4
[1] 9

So the all wrong probability of choose 4 number from 4 number is

Code
Q1=(floor((4*3*2*1+1)/e))/(4*3*2*1)
Q1
[1] 0.375

4.2 Question 2. what is probability of choose 4 number from 4 number and only 1 correct

4.2.1 permutations(order matter)

4!=432*1=24 combination

Code
4*3*2*1
[1] 24

4.2.2 Derangement

any of the 4 number can be correct, and remaining 3 number all wrong and it become the D(3) Derangement problem

Code
e=2.71828
#D(3)=(3!+1)/e

D_3=floor((3*2*1+1)/e)
D_3
[1] 2

So the probability of choose 4 number from 4 number and only 1 correct is

Code
Q2=(4*2)/(4*3*2*1)
Q2
[1] 0.3333333

4.3 Question 3. what is probability of choose 4 number from 4 number and only 2 correct

the problem is same as choose 2 number from 4 number.total 6 Combinations(order no matter)

Code
factorial(4) / (factorial(4-2)*factorial(2))
[1] 6

So the probability of choose 4 number from 4 number and only 2 correct is

Code
Q3=6/(4*3*2*1)
Q3
[1] 0.25

4.4 Question 4.what is probability of choose 4 number from 4 number and only 3 correct

Its same as all correct. since 3 correct the last one will be also correct

Code
Q4=1/(4*3*2*1)
Q4
[1] 0.04166667

4.5 Question 5.what is probability of choose 4 number from 4 number and 4 correct(all correct)

Its same as all correct. since 3 correct the last one will be also correct

So all event total probability is 1

Code
Q1+Q2+Q3+Q4
[1] 1

5 Distribution

5.1 Binomial distribution

the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability p) or failure (with probability 1-p)

for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution

5.1.1 Probability density function (pdf)

1 people snoring Probability

Code
n = 4 # number of people in a room
p = 0.2 # snoring

dbinom(x=1, size=n, prob=p) # 1 people snoring Probability
[1] 0.4096

1,2,3,4 people snoring Probability

Code
n = 4 # number of people in a room
p = 0.2 # snoring

dbinom(x=c(0,1,2,3,4), size=n, prob=p) # 1 people snoring Probability
[1] 0.4096 0.4096 0.1536 0.0256 0.0016

sum of all event Probability is always 1

Code
sum(dbinom(x=c(0,1,2,3,4), size=n, prob=p))
[1] 1

5.1.2 Probability function

<=1 people snoring

Code
pbinom(q=1, size=n,prob=p, lower.tail=TRUE) 
[1] 0.8192

5.1.3 generate 10000 number from 0 to 4 with Probability=0.2

Code
a=rbinom(1000,size=4,0.2) 

table(a)
a
  0   1   2   3 
418 397 160  25 

5.2 Normal Distribution(also called Gaussian distribution)

X is a random variable following a normal distribution with mean μ and variance σ2

68% within 1 standard deviation

95% within 2 standard deviation

99.7% within 3 standard deviation

5.2.1 Z score and standard Normal Distribution

standard Normal Distribution is special Normal Distribution with mean=0 and standard deviation =1

Z table

5.2.2 Standardization

transfer any normal Distribution into standard Normal Distribution

formula:

5.2.3 R function

5.2.4 Probability Density Function (pdf)

computes the pdf at location 0 of N(0,4),normal distribution with mean 1 and variance 4.

sd is the standard deviation, which is the square root of the variance.

Code
dnorm(0, mean = 1, sd = 2)
[1] 0.1760327

5.2.5 cumulative distribution function(cdf)

Probability of <=70 from Normal Distribtion with mean=75 and sd=5

smaller than 1 standard deviation from the mean

Code
pnorm(q=70,mean=75,sd=5)
[1] 0.1586553

Probability of >=80 from Normal Distribtion with mean=75 and sd=5

larger than 1 standard deviation from the mean

Code
1-pnorm(q=80,mean=75,sd=5)
[1] 0.1586553

5.2.6 quantile function

Q1

Code
qnorm(p=0.25,mean=75,sd=5)
[1] 71.62755

Q3

Code
qnorm(p=0.75,mean=75,sd=5)
[1] 78.37245

5.2.7 random number generator

5.2.8 generate 1000 number from Normal Distribtion with mean=75 and sd=5

Code
nd=rnorm(n=1000,mean=75,sd=5)
nd=sort(nd)
Code
mean(nd)
[1] 74.8938
Code
sd(nd)
[1] 5.212053
Code
hist(nd)

Code
dens=dnorm(nd,mean=mean(nd),sd=sd(nd))
Code
plot(nd,dens,type='l')

5.2.9 check data normally distributed

Code
nd_data=rnorm(n=1000,mean=0,sd=2)
nd_data=sort(nd_data)
Code
non_nd_data=seq(1:1000)
non_nd_data=sort(non_nd_data)

5.2.9.1 method 1 :histogram

Code
#define plotting region
par(mfrow=c(1,2)) 

#create histogram for both datasets
hist(nd_data, col='steelblue', main='Normal')
hist(non_nd_data, col='steelblue', main='Non-normal')

5.2.9.2 method 2 :Q-Q plot

Code
#define plotting region
par(mfrow=c(1,2)) 

#create Q-Q plot for both datasets
qqnorm(nd_data, main='Normal')
qqline(nd_data)

qqnorm(non_nd_data, main='Non-normal')
qqline(non_nd_data)

5.2.9.3 Method 3: Shapiro-Wilk Test

null hypothesis (H0):The data is normally distributed.

if p-value =>0.05 then normally distributed

Code
#perform shapiro-wilk test
shapiro.test(nd_data)

    Shapiro-Wilk normality test

data:  nd_data
W = 0.99895, p-value = 0.8451

if p-value <0.05 then not normally distributed(reject the null hypothesis)

Code
#perform shapiro-wilk test
shapiro.test(non_nd_data)

    Shapiro-Wilk normality test

data:  non_nd_data
W = 0.95481, p-value < 2.2e-16

5.3 student t distribution

5.3.1 one sample t test

5.3.2 two sample t test

5.3.3 Paired t test

5.3.4 Pearson correlation coefficient

5.4 F distribution

5.4.1 ANOVA

5.5 Chi-square

5.5.1 Chi-square goodness of fit test

5.5.2 Chi-square test of independence

5.6 Poisson Distribution

6 Reference

https://www.huber.embl.de/users/kaspar/biostat_2021/2-demo.html

https://www.youtube.com/watch?v=peEsXbdMY_4

https://www.youtube.com/watch?v=ETd-jPhI_tE

https://www.youtube.com/watch?v=kvmSAXhX9Hs

https://www.youtube.com/watch?v=RlhnNbPZC0A

https://www.youtube.com/watch?v=X5NXDK6AVtU

https://www.scribbr.com/statistics/probability-distributions/#:~:text=A%20probability%20distribution%20is%20a,using%20graphs%20or%20probability%20tables.

https://www.youtube.com/watch?v=Q_pO9NzWxPY

https://www.statology.org/test-for-normality-in-r/

https://en.wikipedia.org/wiki/Derangement

Code
sessionInfo()
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gtools_3.9.5

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.1    fastmap_1.2.0     cli_3.6.4        
 [5] tools_4.4.1       htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10      
 [9] rmarkdown_2.28    knitr_1.48        jsonlite_1.8.9    xfun_0.48        
[13] digest_0.6.37     rlang_1.1.5       evaluate_1.0.1   
Back to top