Probability

Author

Tony Duan

Probability is the branch of mathematics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an event is to occur.

1 Random number

1.1 draw 10 number from 1 to 10

Code

a=sample(1:10,10, replace=T) 
a

 [1]  9  7  1  6  8  8  9  8 10  1

each number around 10%

Code

as.data.frame(table(a))

1.2 draw 10,000 number from 1 to 10

Code

a=sample(1:10,10000, replace=T)

each number around 10%

Code

as.data.frame(table(a))

2 Permutations and Combinations

2.1 Permutations(order dose matter), 2 number from 4 number

Code

library(gtools)
all_num=4
choose=2

res<- permutations(n= all_num, r = choose, v = c(1:all_num))
res

      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    2    1
 [5,]    2    3
 [6,]    2    4
 [7,]    3    1
 [8,]    3    2
 [9,]    3    4
[10,]    4    1
[11,]    4    2
[12,]    4    3

Code

print (nrow(res))

[1] 12

all_num!/choose!

4!/2!

Code

(4*3*2*1)/(2*1)

[1] 12

Code

factorial(4) / factorial(2)

[1] 12

2.2 Combinations(order no matter), 2 number from 4 number

Code

library(gtools)
all_num=4
choose=2

res<- combinations(n= all_num, r = choose, v = c(1:all_num))
res

     [,1] [,2]
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    2    3
[5,]    2    4
[6,]    3    4

Code

print (nrow(res))

[1] 6

all_num!/((all_num-choose)! * choose!

4!/((4-2)! * 2!)

Code

(4*3*2*1)/((2*1)*(2*1))

[1] 6

Code

factorial(4) / (factorial(4-2)*factorial(2))

[1] 6

3 conditional probability

each one snoring probability is 20%,4 people in one room.

In one room,at least one snoring probability is ?

Code

p=0.2
n=4

3.1 soluition 1:P(at least one)=P(1 snoring)+P(2 snoring)+P(3 snoring)+P(4 snoring)

3.1.1 0 snoring

Code

p0=(0.8*0.8*0.8*0.8)
p0

[1] 0.4096

3.1.2 1 snoring

Code

p1=(0.2*0.8*0.8*0.8)*4
p1

[1] 0.4096

3.1.3 2 snoring

choose 2 from 4: factorial(4) / (factorial(4-2)*factorial(2))

total 6 Permutations

Code

factorial(4) / (factorial(4-2)*factorial(2))

[1] 6

Code

p2=(0.2*0.2*0.8*0.8)*6
p2

[1] 0.1536

3.1.4 3 snoring

choose 3 from 4 Combinations(order matter),factorial(4) / (factorial(4-3)*factorial(3))

4 Combinations:

Code

factorial(4) / (factorial(4-3)*factorial(3))

[1] 4

Code

p3=(0.2*0.2*0.2*0.8)*4
p3

[1] 0.0256

3.1.5 4 snoring

Code

p4=(0.2*0.2*0.2*0.2)
p4

[1] 0.0016

3.1.6 at least one:

Code

P_at_least_one=p1+p2+p3+p4
P_at_least_one

[1] 0.5904

3.2 solution 2:P(at least one)=1-P(no one snoring)

Code

P_at_least_one2=1-0.8*0.8*0.8*0.8
P_at_least_one2

[1] 0.5904

4 Derangement problem

4.1 Question 1.what is probability of choose 4 number from 4 number and 0 correct(all wrong).

4.1.1 permutations(order matter)

4!=432*1=24 combination

Code

4*3*2*1

[1] 24

4.1.2 Derangement

There are 9 Derangement(all wrong)

Code

e=2.71828
#D(4)=(4!+1)/e

D_4=floor((4*3*2*1+1)/e)
D_4

[1] 9

So the all wrong probability of choose 4 number from 4 number is

Code

Q1=(floor((4*3*2*1+1)/e))/(4*3*2*1)
Q1

[1] 0.375

4.2 Question 2. what is probability of choose 4 number from 4 number and only 1 correct

4.2.1 permutations(order matter)

4!=432*1=24 combination

Code

4*3*2*1

[1] 24

4.2.2 Derangement

any of the 4 number can be correct, and remaining 3 number all wrong and it become the D(3) Derangement problem

Code

e=2.71828
#D(3)=(3!+1)/e

D_3=floor((3*2*1+1)/e)
D_3

[1] 2

So the probability of choose 4 number from 4 number and only 1 correct is

Code

Q2=(4*2)/(4*3*2*1)
Q2

[1] 0.3333333

4.3 Question 3. what is probability of choose 4 number from 4 number and only 2 correct

the problem is same as choose 2 number from 4 number.total 6 Combinations(order no matter)

Code

factorial(4) / (factorial(4-2)*factorial(2))

[1] 6

So the probability of choose 4 number from 4 number and only 2 correct is

Code

Q3=6/(4*3*2*1)
Q3

[1] 0.25

4.4 Question 4.what is probability of choose 4 number from 4 number and only 3 correct

Its same as all correct. since 3 correct the last one will be also correct

Code

Q4=1/(4*3*2*1)
Q4

[1] 0.04166667

4.5 Question 5.what is probability of choose 4 number from 4 number and 4 correct(all correct)

Its same as all correct. since 3 correct the last one will be also correct

So all event total probability is 1

Code

Q1+Q2+Q3+Q4

[1] 1

5 Distribution

5.1 Binomial distribution

the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability p) or failure (with probability 1-p)

for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution

5.1.1 Probability density function (pdf)

1 people snoring Probability

Code

n = 4 # number of people in a room
p = 0.2 # snoring

dbinom(x=1, size=n, prob=p) # 1 people snoring Probability

[1] 0.4096

1,2,3,4 people snoring Probability

Code

n = 4 # number of people in a room
p = 0.2 # snoring

dbinom(x=c(0,1,2,3,4), size=n, prob=p) # 1 people snoring Probability

[1] 0.4096 0.4096 0.1536 0.0256 0.0016

sum of all event Probability is always 1

Code

sum(dbinom(x=c(0,1,2,3,4), size=n, prob=p))

[1] 1

5.1.2 Probability function

<=1 people snoring

Code

pbinom(q=1, size=n,prob=p, lower.tail=TRUE)

[1] 0.8192

5.1.3 generate 10000 number from 0 to 4 with Probability=0.2

Code

a=rbinom(1000,size=4,0.2) 

table(a)

a
  0   1   2   3 
418 397 160  25

5.2 Normal Distribution(also called Gaussian distribution)

X is a random variable following a normal distribution with mean μ and variance σ2

68% within 1 standard deviation

95% within 2 standard deviation

99.7% within 3 standard deviation

5.2.1 Z score and standard Normal Distribution

standard Normal Distribution is special Normal Distribution with mean=0 and standard deviation =1

Z table

5.2.2 Standardization

transfer any normal Distribution into standard Normal Distribution

formula:

5.2.3 R function

5.2.4 Probability Density Function (pdf)

computes the pdf at location 0 of N(0,4),normal distribution with mean 1 and variance 4.

sd is the standard deviation, which is the square root of the variance.

Code

dnorm(0, mean = 1, sd = 2)

[1] 0.1760327

5.2.5 cumulative distribution function(cdf)

Probability of <=70 from Normal Distribtion with mean=75 and sd=5

smaller than 1 standard deviation from the mean

Code

pnorm(q=70,mean=75,sd=5)

[1] 0.1586553

Probability of >=80 from Normal Distribtion with mean=75 and sd=5

larger than 1 standard deviation from the mean

Code

1-pnorm(q=80,mean=75,sd=5)

[1] 0.1586553

5.2.6 quantile function

Code

qnorm(p=0.25,mean=75,sd=5)

[1] 71.62755

Code

qnorm(p=0.75,mean=75,sd=5)

[1] 78.37245

5.2.7 random number generator

5.2.8 generate 1000 number from Normal Distribtion with mean=75 and sd=5

Code

nd=rnorm(n=1000,mean=75,sd=5)
nd=sort(nd)

Code

mean(nd)

[1] 74.8938

Code

sd(nd)

[1] 5.212053

Code

hist(nd)

Code

dens=dnorm(nd,mean=mean(nd),sd=sd(nd))

Code

plot(nd,dens,type='l')

5.2.9 check data normally distributed

Code

nd_data=rnorm(n=1000,mean=0,sd=2)
nd_data=sort(nd_data)

Code

non_nd_data=seq(1:1000)
non_nd_data=sort(non_nd_data)

5.2.9.1 method 1 :histogram

Code

#define plotting region
par(mfrow=c(1,2)) 

#create histogram for both datasets
hist(nd_data, col='steelblue', main='Normal')
hist(non_nd_data, col='steelblue', main='Non-normal')

5.2.9.2 method 2 :Q-Q plot

Code

#define plotting region
par(mfrow=c(1,2)) 

#create Q-Q plot for both datasets
qqnorm(nd_data, main='Normal')
qqline(nd_data)

qqnorm(non_nd_data, main='Non-normal')
qqline(non_nd_data)

5.2.9.3 Method 3: Shapiro-Wilk Test

null hypothesis (H0):The data is normally distributed.

if p-value =>0.05 then normally distributed

Code

#perform shapiro-wilk test
shapiro.test(nd_data)


    Shapiro-Wilk normality test

data:  nd_data
W = 0.99895, p-value = 0.8451

if p-value <0.05 then not normally distributed(reject the null hypothesis)

Code

#perform shapiro-wilk test
shapiro.test(non_nd_data)


    Shapiro-Wilk normality test

data:  non_nd_data
W = 0.95481, p-value < 2.2e-16

5.3 student t distribution

5.3.1 one sample t test

5.3.2 two sample t test

5.3.3 Paired t test

5.3.4 Pearson correlation coefficient

5.4 F distribution

5.4.1 ANOVA

5.5 Chi-square

5.5.1 Chi-square goodness of fit test

5.5.2 Chi-square test of independence

5.6 Poisson Distribution

6 Reference

https://www.huber.embl.de/users/kaspar/biostat_2021/2-demo.html

https://www.youtube.com/watch?v=peEsXbdMY_4

https://www.youtube.com/watch?v=ETd-jPhI_tE

https://www.youtube.com/watch?v=kvmSAXhX9Hs

https://www.youtube.com/watch?v=RlhnNbPZC0A

https://www.youtube.com/watch?v=X5NXDK6AVtU

https://www.scribbr.com/statistics/probability-distributions/#:~:text=A%20probability%20distribution%20is%20a,using%20graphs%20or%20probability%20tables.

https://www.youtube.com/watch?v=Q_pO9NzWxPY

https://www.statology.org/test-for-normality-in-r/

https://en.wikipedia.org/wiki/Derangement

Code

sessionInfo()

sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gtools_3.9.5

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.1    fastmap_1.2.0     cli_3.6.4        
 [5] tools_4.4.1       htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10      
 [9] rmarkdown_2.28    knitr_1.48        jsonlite_1.8.9    xfun_0.48        
[13] digest_0.6.37     rlang_1.1.5       evaluate_1.0.1

--- title: "Probability" author: "Tony Duan" execute: warning: false error: false format: html: toc: true toc-location: right code-fold: show code-tools: true number-sections: true code-block-bg: true code-block-border-left: "#31BAE9" --- Probability is the branch of mathematics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an event is to occur. ![](images/images-01.jpg){width="350"} # Random number ## draw 10 number from 1 to 10 ```{r} a=sample(1:10,10, replace=T) a ``` each number around 10% ```{r} as.data.frame(table(a)) ``` ## draw 10,000 number from 1 to 10 ```{r} a=sample(1:10,10000, replace=T) ``` each number around 10% ```{r} as.data.frame(table(a)) ``` # Permutations and Combinations ![](images/clipboard-2817248623.png) ## Permutations(order dose matter), 2 number from 4 number ```{r} library(gtools) all_num=4 choose=2 res<- permutations(n= all_num, r = choose, v = c(1:all_num)) res ``` ```{r} print (nrow(res)) ``` all_num!/choose! 4!/2! ```{r} (4*3*2*1)/(2*1) ``` or ```{r} factorial(4) / factorial(2) ``` ## Combinations(order no matter), 2 number from 4 number ```{r} library(gtools) all_num=4 choose=2 res<- combinations(n= all_num, r = choose, v = c(1:all_num)) res ``` ```{r} print (nrow(res)) ``` all_num!/((all_num-choose)! \* choose! 4!/((4-2)! \* 2!) ```{r} (4*3*2*1)/((2*1)*(2*1)) ``` or ```{r} factorial(4) / (factorial(4-2)*factorial(2)) ``` # conditional probability each one snoring probability is 20%,4 people in one room. In one room,at least one snoring probability is ? ```{r} p=0.2 n=4 ``` ## soluition 1:P(at least one)=P(1 snoring)+P(2 snoring)+P(3 snoring)+P(4 snoring) ### 0 snoring ```{r} p0=(0.8*0.8*0.8*0.8) p0 ``` ### 1 snoring ```{r} p1=(0.2*0.8*0.8*0.8)*4 p1 ``` ### 2 snoring choose 2 from 4: factorial(4) / (factorial(4-2)\*factorial(2)) total 6 Permutations ```{r} factorial(4) / (factorial(4-2)*factorial(2)) ``` ```{r} p2=(0.2*0.2*0.8*0.8)*6 p2 ``` ### 3 snoring choose 3 from 4 Combinations(order matter),factorial(4) / (factorial(4-3)\*factorial(3)) 4 Combinations: ```{r} factorial(4) / (factorial(4-3)*factorial(3)) ``` ```{r} p3=(0.2*0.2*0.2*0.8)*4 p3 ``` ### 4 snoring ```{r} p4=(0.2*0.2*0.2*0.2) p4 ``` ### at least one: ```{r} P_at_least_one=p1+p2+p3+p4 P_at_least_one ``` ## solution 2:P(at least one)=1-P(no one snoring) ```{r} P_at_least_one2=1-0.8*0.8*0.8*0.8 P_at_least_one2 ``` # Derangement problem ## Question 1.what is probability of choose 4 number from 4 number and 0 correct(all wrong). ### permutations(order matter) 4!=4*3*2\*1=24 combination ```{r} 4*3*2*1 ``` ### Derangement ![](images/clipboard-63529608.png) or ![](images/55.png) There are 9 Derangement(all wrong) ```{r} e=2.71828 #D(4)=(4!+1)/e D_4=floor((4*3*2*1+1)/e) D_4 ``` So the all wrong probability of choose 4 number from 4 number is ```{r} Q1=(floor((4*3*2*1+1)/e))/(4*3*2*1) Q1 ``` ## Question 2. what is probability of choose 4 number from 4 number and only 1 correct ### permutations(order matter) 4!=4*3*2\*1=24 combination ```{r} 4*3*2*1 ``` ### Derangement any of the 4 number can be correct, and remaining 3 number all wrong and it become the D(3) Derangement problem ```{r} e=2.71828 #D(3)=(3!+1)/e D_3=floor((3*2*1+1)/e) D_3 ``` So the probability of choose 4 number from 4 number and only 1 correct is ```{r} Q2=(4*2)/(4*3*2*1) Q2 ``` ## Question 3. what is probability of choose 4 number from 4 number and only 2 correct the problem is same as choose 2 number from 4 number.total 6 Combinations(order no matter) ```{r} factorial(4) / (factorial(4-2)*factorial(2)) ``` So the probability of choose 4 number from 4 number and only 2 correct is ```{r} Q3=6/(4*3*2*1) Q3 ``` ## Question 4.what is probability of choose 4 number from 4 number and only 3 correct Its same as all correct. since 3 correct the last one will be also correct ```{r} Q4=1/(4*3*2*1) Q4 ``` ## Question 5.what is probability of choose 4 number from 4 number and 4 correct(all correct) Its same as all correct. since 3 correct the last one will be also correct So all event total probability is 1 ```{r} Q1+Q2+Q3+Q4 ``` # Distribution ## Binomial distribution the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability p) or failure (with probability 1-p) for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution ### Probability density function (pdf) 1 people snoring Probability ```{r} n = 4 # number of people in a room p = 0.2 # snoring dbinom(x=1, size=n, prob=p) # 1 people snoring Probability ``` 1,2,3,4 people snoring Probability ```{r} n = 4 # number of people in a room p = 0.2 # snoring dbinom(x=c(0,1,2,3,4), size=n, prob=p) # 1 people snoring Probability ``` sum of all event Probability is always 1 ```{r} sum(dbinom(x=c(0,1,2,3,4), size=n, prob=p)) ``` ### Probability function \<=1 people snoring ```{r} pbinom(q=1, size=n,prob=p, lower.tail=TRUE) ``` ### generate 10000 number from 0 to 4 with Probability=0.2 ```{r} a=rbinom(1000,size=4,0.2) table(a) ``` ## Normal Distribution(also called Gaussian distribution) X is a random variable following a normal distribution with mean μ and variance σ2 68% within 1 standard deviation 95% within 2 standard deviation 99.7% within 3 standard deviation ### Z score and standard Normal Distribution standard Normal Distribution is special Normal Distribution with mean=0 and standard deviation =1 ![](images/clipboard-4010975181.png) Z table ![](images/clipboard-3635575854.png) ### Standardization transfer any normal Distribution into standard Normal Distribution ![](images/clipboard-2062792649.png) formula: ![](images/clipboard-625920785.png){width="421"} ### R function ![](images/clipboard-1214752693.png) ### Probability Density Function (pdf) ![](images/clipboard-3193047058.png){width="500"} computes the pdf at location 0 of N(0,4),normal distribution with mean 1 and variance 4. sd is the standard deviation, which is the square root of the variance. ```{r} dnorm(0, mean = 1, sd = 2) ``` ### cumulative distribution function(cdf) ![](images/clipboard-3037827223.png){width="500"} Probability of \<=70 from Normal Distribtion with mean=75 and sd=5 smaller than 1 standard deviation from the mean ```{r} pnorm(q=70,mean=75,sd=5) ``` Probability of \>=80 from Normal Distribtion with mean=75 and sd=5 larger than 1 standard deviation from the mean ```{r} 1-pnorm(q=80,mean=75,sd=5) ``` ### quantile function ![](images/clipboard-3591572952.png){width="500"} Q1 ```{r} qnorm(p=0.25,mean=75,sd=5) ``` Q3 ```{r} qnorm(p=0.75,mean=75,sd=5) ``` ### random number generator ![](images/clipboard-3231751048.png){width="500"} ### generate 1000 number from Normal Distribtion with mean=75 and sd=5 ```{r} nd=rnorm(n=1000,mean=75,sd=5) nd=sort(nd) ``` ```{r} mean(nd) ``` ```{r} sd(nd) ``` ```{r} hist(nd) ``` ```{r} dens=dnorm(nd,mean=mean(nd),sd=sd(nd)) ``` ```{r} plot(nd,dens,type='l') ``` ### check data normally distributed ```{r} nd_data=rnorm(n=1000,mean=0,sd=2) nd_data=sort(nd_data) ``` ```{r} non_nd_data=seq(1:1000) non_nd_data=sort(non_nd_data) ``` #### method 1 :histogram ```{r} #define plotting region par(mfrow=c(1,2)) #create histogram for both datasets hist(nd_data, col='steelblue', main='Normal') hist(non_nd_data, col='steelblue', main='Non-normal') ``` #### method 2 :Q-Q plot ```{r} #define plotting region par(mfrow=c(1,2)) #create Q-Q plot for both datasets qqnorm(nd_data, main='Normal') qqline(nd_data) qqnorm(non_nd_data, main='Non-normal') qqline(non_nd_data) ``` #### Method 3: Shapiro-Wilk Test null hypothesis (H0):The data is normally distributed. if p-value =\>0.05 then normally distributed ```{r} #perform shapiro-wilk test shapiro.test(nd_data) ``` if p-value \<0.05 then not normally distributed(reject the null hypothesis) ```{r} #perform shapiro-wilk test shapiro.test(non_nd_data) ``` ## student t distribution ![](images/clipboard-3444893160.png) ### one sample t test ### two sample t test ### Paired t test ### Pearson correlation coefficient ## F distribution ### ANOVA ## Chi-square ### Chi-square goodness of fit test ### Chi-square test of independence ## Poisson Distribution # Reference https://www.huber.embl.de/users/kaspar/biostat_2021/2-demo.html https://www.youtube.com/watch?v=peEsXbdMY_4 https://www.youtube.com/watch?v=ETd-jPhI_tE https://www.youtube.com/watch?v=kvmSAXhX9Hs https://www.youtube.com/watch?v=RlhnNbPZC0A https://www.youtube.com/watch?v=X5NXDK6AVtU https://www.scribbr.com/statistics/probability-distributions/#:\~:text=A%20probability%20distribution%20is%20a,using%20graphs%20or%20probability%20tables. https://www.youtube.com/watch?v=Q_pO9NzWxPY https://www.statology.org/test-for-normality-in-r/ https://en.wikipedia.org/wiki/Derangement ```{r, attr.output='.details summary="sessionInfo()"'} sessionInfo() ```