Probability

Notes on Oxford lectures and solutions/answers to all problems from this book.

Buzzwords

Sample space, events, probability measure. Sampling with or without replacement. Conditional probability = partition of sample space, law of total probability/total expectation, Bayes’ Theorem. Independence.

variance, covariance, zero covariance does not imply independence, in normal distributions zero covariance = independence

Discrete radom variablers, pmf = probability mass function, Marginal and conditional distributions, first and second order linear difference equations (fibonacci), random walk, Gambler's ruin.

Poisson interpretation: The number of occurrences during a time interval of length λ\lambda is a Poisson-distributed random variable with expected value λ\lambda.

Po (λ)(\lambda) is approximation of Bin (n,λn)(n,\dfrac{\lambda}{n}) for large nn, derive from pmf of binomial.

Poisson limit therem, law of rare events, poisson approximation of binomial (less computationally expensive), exponential distribution is the poisson discrete equivalent

negative binomial is sum of geometric distributions, binomial is sum of bernoulli distributions

gamma(iai,b)gamma(\sum_{i}a_i,b) is sum of gamma distributions gamma(ai,b)gamma(a_i,b)

sum of nn exponential distributions Exp(λ)Exp(\lambda) is Gamma(n,λ)Gamma(n,\lambda)

probability generating function, branching processes, solution to probability of extinction s=G(s)s = G(s), random sum formula GN(GX(s))=GS(s)G_N(G_X(s)) = G_S(s) where S=X1+X2+...+XnS = X_1 + X_2 + ... + X_n, Xi XX_i ~ X iid and independent from NN. use it to prove E(S)=E(N)E(X)E(S) = E(N)E(X) (alternatively use total law of expectation) . This solves problems with stopping times too.

use pgfs to compute P(Xmod2=0)P(X mod 2 = 0), hint G(1),G(1)G(1), G(-1).

pgf helps to prove sum of independent distribution is another distribution use GX+Y(s)=GX(s)GY(s)G_{X+Y}(s) = G_{X}(s)G_{Y}(s) and that pgf defines uniquely the distribution

use pgf's to compute distribution. P(X=k)=k-th derivativeGX(0)P(X = k) = \text{k-th derivative} G_X(0), eval derivatives at 0, need to be easility differentiable

cts random variables, cdf, pdf

pdf has similar properties as pmf but is NOT a probability.

sample any cdf from uniform samples. FX1(U)F_{X}^{-1}(U)

E[X]=0P(X>x)E[X] = \int_{0}^{\infty}P(X>x), need XX to be non-negative random variable. Prove that using swap integrands (Tonelli's theorem).

Random sample, sums of independent random variables. Markov’s inequality, Chebyshev’s inequality, Weak Law of Large Numbers.

mixture distributions good for bayesian modelling

  • Eulers's formula (Riemann zeta function expressed with prime numbers) has cool probabilistic proof. Check sheet 2 last problem.

P(break stick into n pieces and have polygon)=P(all pieces are less than 12)=1P(all points lie in one semi circle)=1n2n1P(\text{break stick into n pieces and have polygon}) = P(\text{all pieces are less than } \dfrac{1}{2}) = 1 - P(\text{all points lie in one semi circle}) = 1- \dfrac{n}{2^{n-1}}

Chapter 1. Events and probability

Set of all possible outcomes Ω\Omega is called the sample space. S subset of Ω\Omega is called an event.

For events AA and BB we can do set operations:

  • ABA ∪ B means at least one of the two occurs

  • ABA ∩ B means both occur

  • ABA - B means AA occurs but BB does not

    We assign probabilities P(A)P(A) to events.

    Counting. Number of permutations of nn distinct elements is n! 2πnn+0.5enn! ~ 2π n^{n+ 0.5} e^{−n}

Binomial coefficient (Nk)=N!(Nk)!k!{N\choose k} = \frac{N!}{(N-k)! k!}.

Binomial theorem expands (x+y)n==(x+y)(x+y)(x+y)(x+y)^n = = (x + y)(x + y) · · · (x + y). Proof by counting.

Bijectionargument in counting problems

Example How many distinct non-negative intger-valued solutions of the equation x1+...xm=nx_1 +... x_m = n are there?

(n+m1m1){n+m-1\choose m-1} - use sticks argument

Lemma. Vandermonde’s identity (m+nk)=jk(mj)(nkj){m+n\choose k} = \sum_j^k {m\choose j}{n\choose k-j}

Use Breaking things down argument

Countable sets are those which you can label (i.e. map to the integer space), Uncountable sets cannot be labelled like R\R

Definition 1.5. A probability space is a triple (Ω,F,P)(Ω, F, P) where

  1. ΩΩ is the sample space,
  2. FF is a collection of subsets of ΩΩ, called events, satisfying axioms F1 –F3 below,
  3. PP is a probability measure, which is a function P:F[0,1]P : F → [0, 1] satisfying axioms P1 –P3 below.

The axioms of probability

FF is a collection of subsets of Ω, with:

  • F1 : ∅ ∈ FF. empty event is in FF
  • F2 : If AFA ∈ F, then also AcFA^{c} ∈ F. An event and its complementary are both in FF
  • F3 : If {Ai , i ∈ I} is a finite or countably infinite collection of members of FF, then AiF∪A_i ∈ F. FF has the notion of closure.

PP is a function from FF to R\R, with:

  • P1 : For all AF,P(A)0A ∈ F, P(A) ≥ 0.
  • P2 : P(Ω)=1P(Ω) = 1. All events have probability 1.
  • P3 : If {Ai , i ∈ I} is a finite PP or countably infinite collection of members of FF, and AiAj=A_i ∩ A_j = ∅ for i!=ji != j, then $P(∪A_i ) = \sum P(A_i) $ Distributivity of union over intersection.

P3 would not be true if it was just for pairwise sets. The above is stronger!

Theorem 1.9. Suppose that (Ω,F,P)(Ω, F, P) is a probability space and that A,BFA, B ∈ F. Then

  1. P(Ac)=1P(A)P (A^c ) = 1 − P (A);
  2. If ABA ⊆ B then P(A)P(B)P(A) ≤ P (B).

Definition 1.11. Let (Ω,F,P)(Ω, F, P) be a probability space. If A,BFA, B ∈ F and P(B)>0P(B) > 0 then the conditional probability of AA given BB is

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A ∩ B)}{P(B)}

Probability space is a powerful thing. You have all the axioms above to be true!

Lemma If (Ω,F,P)(Ω, F, P), then for any event BB, if you swap P(A)withQ(A)=P(AB)P(A) with Q(A) = P(A|B) then (Ω,F,Q)(Ω, F, Q) is a probability space too! That is if you condition your probability space on certain event you still have all the axioms.

Independece Events AA and BB are independent if P(AB)=P(A)P(B)P(A ∩ B) = P(A)P(B).

A family of events is independend if P(Ai)=P(Ai)P(∩ A_i) = \prod P(A_i)

PAIRWISE INDEPENDENT DOES NOT IMPLY INDEPENDENCE.

AA and BB independent imply AA and BcB^c are independent.

Theorem 1.20 (The law of total probability). Suppose {B1,B2,...}\{B1 , B2 , . . .\} is a partition of ΩΩ by sets from FF, such that P(Bi)>0P(B_i ) > 0 for all i1i ≥ 1. Then for any AFA ∈ F,

P(A)=i1P(ABi)P(Bi)P(A) = \sum_{i≥1} P(A|B_i)P(B_i).

(partition theorem)

Bayes theorem = Conditional probability + law of total probability

P(AB)=P(BA)P(A)P(B)=P(BA)P(A)P(BAi)P(Ai)P(A|B) = \dfrac{P(B|A)P(A)}{P(B)} = \dfrac{P(B|A)P(A)}{\sum P(B|A_i)P(A_i)}

Simpson’s paradox

simpson_paradox.png

Problems

Solutions to 1.11 from the book.

Q1. Condition on first event and do linear differencing equation. Homogeneous and particular solution.

pn=1/6+2/3pn1p_n = 1/6 + 2/3 p_{n-1}. Can use indeuction too.

Note this is a binomial distribution and we compute probability we have even outcome. Can expend (1+x)n+(1n)n(1+x)^{n} + (1-n)^n

Q2. No. Finate spaces should be power of two.

Q3. By induction. Use union operation is associative P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)

Q4. By Q3 and P(A1A2...An)=1P((A1A2...An)c)=1P(A1c...Anc)P(A_1 ∪ A_2 ∪ . . . ∪ A_n) = 1 − P ((A_1 ∪ A_2 ∪ . . . ∪ A_n)^{c} ) = 1 − P(A_1^{c} ∩ . . . ∩ A_n^{c})

Q5. Example of pairwise independence (3 events) that does not imply independence of all 3 events P(ABC)P(A)P(B)P(C)P(A \cap B \cap C) \neq P(A)P(B)P(C).

Q6. Conditional probability + Bayes. 79/140, 40/61

Q7. 3 spades sequences/all sequences =13.12.1113.12.50= \dfrac{13.12.11}{13.12.50}

Q8. Binomial distribution and expand Stirling.

Q9. Contidional probability + Binomial disribution + Vandermonde’s identity

Q10. Skip physics

Q11. Law of total probability (parititon theorem). They want to get the eight element so can do manually with iteration. I don't see easy way to solve this difference equation by hand?

#TODO Q12. Extra hard, did not solve it. stack, normal numbers, Champernowne constant

Q13. Conditional probability + algebra iteration... Goal is to get differencing equation in each variable e.g. f(cn+1,cn,cn1)=0f(c_{n+1},c_n,c_{n-1}) = 0

Q14. a) Inclusion-exclusion principle. b) e1,1e1e^{-1}, 1-e^{-1}

Q15. Condition on k cards match. Then use incllusion exclusion principle.

Q16. Conditional probability on when 8:45 and 9:00 trains come. e12+e24+e44\dfrac{e^{-1}}{2}+\dfrac{e^{-2}}{4}+\dfrac{e^{-4}}{4}

Q17. #TODO

Q18 #TODO

Q19. n=6n=6

Chapter 2. Discrete random variables

Encode information about an outcome using numbers

Definition 2.1. A discrete random variable X on a probability space (Ω,F,P)(Ω, F, P) is a function X:ΩRX : Ω → R such that

  • (a) {ωΩ:X(ω)=x}F\{ω ∈ Ω : X(ω) = x\} ∈ F for each xRx ∈ R,
  • (b) ImX:={X(ω):ωΩ}ImX := \{X(ω) : ω ∈ Ω\} is a finite or countable subset of RR.

(a) says {ωΩ:X(ω)=x}={X=x}\{ω ∈ Ω : X(ω) = x\} = \{X = x\} lives in FF, that is it is an event and we can assign probability.

(b) is the definition of discrete.

Definition 2.3. A probability mass function (pmf) of a random variable is pX(x)=(X=x)p_{X}(x) = \P(X = x) s.t.:

  • pX(x)0p_X(x) ≥ 0 for all xx,
  • pX(x)=1\sum p_X(x) = 1

Classic discrete distributions:

  • Bernoulli
  • Binomial
  • Geometric
  • Poisson

Definition 2.6 The expectation (or expected value or mean) of XX is

E[X]=xImXxP(X=x)E[X] = \sum_{x∈ImX} xP(X = x) provided that xImXxP(X=x)<inf\sum_{x∈ImX}|x|P (X = x) < inf, otherwise it does not exist. We require absolutely convergent series sum.

The expectation of XX is the ‘average’ value which XX takes.

Theorem. E[f(X)]=xImXf(x)P(X=x)E[f(X)] = \sum_{x∈ImX} f(x)P(X = x)

Linearity of expectation:

  • E[aX+b]=aE[X]+bE[aX + b] = aE[X] + b
  • E[X+Y]=E[X]+E[Y]E[X+Y] = E[X] + E[Y]

Definition 2.11. For a discrete random variable XX, the variance of XX is defined by:

var(X)=E[(XE[X])2]=E[X2](E[X])2var(X) = E[(X − E[X])^2] = E[X^2] - (E[X])^2, provided that this quantity exists.

Variance gives size of fluctuations around the expectation.

var(Y)=var(aX+b)=a2var(X)var(Y) = var(aX + b) = a^2var(X)

Theorem 2.14 (Law of total expectation = Partition theorem for expectations).

E[X]=BE[XB]P(B)E[X] = \sum_{B} E[X|B]P(B)

Joint pmf is pX,Y(x,y)=P(X=x,Y=y)=P({X=x}{Y=y})p_{X,Y}(x,y) = P(X=x,Y=y) = P(\{X=x\} \cap \{Y=y\})

Marginal distribution exists for joint distribution and is just integrating/summing out one of the variables pX(x)=ypX,Y(x,y)p_{X}(x) = \sum_{y} p_{X,Y}(x,y)

Theorem 2.23 If XX and YY are independent, then E[XY]=E[X]E[Y]E[XY] = E[X]E[Y]. Reverse is NOT true.

Definition cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]cov (X,Y) = E[(X − E[X])(Y − E[Y])] = E[XY] - E[X]E[Y].

i.i.d. = independent and identically distributed

Problems

Solution from chapter 2.6 in book.

Q1. f(x)=x!(xk)!f(x) = \dfrac{x!}{(x-k)!}

Q2. rp\dfrac{r}{p}

Q3. Discrete RV with zero variance then, the RV is eqaul to the mean. In cts RV, X=E[X]X=E[X] almost surely (degenerate distribution).

Q4. α<1\alpha < -1 need convergence, c=1/ζ(α)c = 1/\zeta(-\alpha) Rieman zeta function

Q5. Lack of memory proerty of Geometric distribution. Kvot bilo bilo.

Q6.

  1. Proof by visualization.
  2. 3(23)k3(\dfrac{2}{3})^{k}
  3. coupun collection problem, linearity of exp. 3(1+1/2+1/3)3*(1+1/2+1/3)

Q7. harmonic series log(n)log(n)

Q8. #TODO

Q9. Expected tosses till see nn consecutive heads 1pnpn(1p)\dfrac{1-p^n}{p^n(1-p)}. Write it using Markov chain state technique and goal is to compute e0e_0, with boundary condition eH..HH=en=0e_{H..HH} = e_n = 0 To solve it you need to take difference of consecutive equations and do telescoping sum like idea.

Q10. #TODO

Chapter 3. Difeerence equations and random walks

Theorem 3.3 The general solution (un)n>=0(u_n)_{n>=0} to a difference equation:

j=0kajun+j=f(n)\sum_{j=0}^{k} a_{j}u_{n+j} = f(n)

can be written as un=vn+wnu_n = v_n + w_n where:

  • vnv_n is a particular solution to the equation (resembles f(n)f(n))
  • wnw_n solves the homogeneous equation j=0kajwn+j=0\sum_{j=0}^{k}a_{j}w_{n+j} = 0

To solve the homogeneous equation when you have k>1k>1 you need to get the auxilary equation - substitute in the homogenous equat un=xnu_n = x^{n} and the roots of it gives you the general form of the solution to the homogeneous equation. For k=2k=2 it looks like: wn=A1x1n+A2x2nw_n = A_1 x_{1}^{n} + A_2 x_{2}^{n}

If x1=x2x_1=x_2, then try wn=(A+Bn)x1nw_n = (A + Bn)x_{1}^{n}

To solve the particular equation try solutions similar to f(n)f(n) if it fails try the next most complicated thing.

Gamblers ruin:

un=pun+1+qun1u_n = pu_{n+1} + qu_{n-1}, where un=P(bankruptcy)u_n = P(bankruptcy) if gambler has nn money. Boundary conditions: u0=1,uM=0u_0 = 1, u_M = 0.

This is second order difference equation. If you remove the boundary MM you need to take limits limM>infun(M)=P(hit 0 before )M\lim_{M->inf}u_n^{(M)} = P(\text{hit 0 before )M}.

gamblers_ruin.png gamblers_hit0.png

P(everhit0)P(ever hit 0) is limM>infun(M)\lim_{M->inf}u_n^{(M)}. To prove that formally you need to use that for an increasing set of events A1A2A3...AM...A_1 ⊆ A_2 ⊆ A_3 ⊆ ... ⊆ A_M ... you have P(Ak)=limk>infP(Ak)P(\cup A_k) = lim_{k->inf} P(A_k).

Gamblers ruin for expectation en=e_n = number of steps to hit absorbing stage.

en=p(1+en+1)+q(1+en1)e_n = p(1+e_{n+1}) + q(1+e_{n-1}), with bc e0=eM=0e_0 = e_M = 0

Problems

Solution to problems from Chapter 10.5

Q1. P(twoiidrandomwalksendinthesameposition)P(two iid random walks end in the same position) can be expressed using two expressions. Think about the two walks separately and combined as in one walk.

Q2. Gamblers ruin with different parameters.

Q3. Condition on first step and use gamblers ruin solution for n=1,1n = 1,-1. 1N\dfrac{1}{N}

Q4. Skipped, to do property need to take limit as M>infM->inf and define anMa_n^{M}

#TODO finish questions, might need to read chapter. It is more comprehensive than the lecture notes

Chapter 4 Probability generating functions

The probability generating function of a discrete random variable is a power series representation of its pmf. It is defined only for discrete random variables.

GX(s)=E[sX]=kskP(X=k)G_X(s) = E[s^{X}] = \sum_{k} s^kP(X = k)

where sRs\in \R such that the expectation exists.

Theorem 4.2. The distribution of XX is uniquely determined by its probability generating function, GXG_X .

The kkth derivative of GG evaluated at zero gives P(X=k)P(X = k).

Theorem 4.3 If XX and YY are independent, then GX+Y(s)=GX(s)GY(s)G_{X+Y}(s) = G_X(s)G_Y(s)

Use the above two theorems to prove that sum of nn independent Bernoulli random variables is a Binomial random variable.

Sum of nn iid Poisson rvs is Poisson rv with parameter = sum of parameters.

Calculate moments by taking derivative of the pgf and evaluate at s=1s = 1

Theorem 4.8. Let X1,X2,...X_1 , X_2 , ... be i.i.d. non-negative integer-valued random variables with p.g.f. GX(s)G_X (s). Let NN be another non-negative integer-valued random variable, independent of X1,X2,...X_1, X_2 ,... and with p.g.f. GN(s)G_N(s). Then the p.g.f. of i=1NXi\sum_{i=1}^{N} X_i is GN(GX(s))G_N(G_X(s)).

This chain trick appears in branching processes. Start with one individual which gives birth to many children. In generation nn each individual in the population gives children. The pgf of total number of individuals in generation n+1n+1 satisfies Gn+1(s)=G(Gn(s))G_{n+1}(s) = G(G_n(s)) Where GG is the pgf of random variable giving number of children for 1 individual. By induction you can express Gn+1(s)G_{n+1}(s) as nested G(s)G(s) n times.

Extintion probability.

s=G(s)s = G(s) is solution to probability of extinction!

q=kqkP(X=k)=G(q)q = \sum_{k}q^k P(X=k) = G(q), where we condition on the number of children of the first individual.

This equation always have solution at 1. However you cannot solve it for qq.

You need to go this way: q=P(n{Xn=0})=limn>infP(Xn=0)=limn>infGn(0)q = P(\cup_{n} \{X_n = 0\}) = lim_{n->inf} P(X_n = 0) = lim_{n->inf}G_n(0) using the fact about increasing sequences of events.

It turns out that the question of whether the branching process inevitably dies out is determined by the mean number of children of a single individual. Then we find that there is a positive probability of survival of the process for ever if and only if µ>1µ > 1. (Theorem 4.15 proves it)

Problems

Problems from Chapter 4.5 from the book

Q1. $P(X = k) = u_{k-1}-u_kS

Q2. 167((136)49)\dfrac{1}{6^7}({13\choose 6} - 49), need to count number of ways you can sum 7 number and equal to 14 then subtract when any of the numbers is greater than 6 (could be 7 or 8 only, sum 6 numbers is minimum 6).

Q3. Geometric distribution with p=1/3p=1/3. Get pgf of geo + some arithmetic.

probability of winning are sum of geometric series. Mean durtion of game is 3.

Q4. Sum of NN alternative geometric distributions.

Q5.

  • a) GN(GX(s))G_N(G_X(s)) where N Alternative GeometricN ~ \text{Alternative Geometric} and X Ber(1/2)X ~ Ber(1/2), differentiate pgf many times and evaluate at 0
  • b)

Q6.

  • a) npnp, np(1p)np(1-p)
  • b) 1+(12p)n2\dfrac{1+(1-2p)^n}{2}, eval pgf at -1
  • c) evaluate pgf at smart points trick

In this case the pgf evaluated at 3rd non trivial roots of unity w3=1w^3 = 1 is GX(w)=1w2P(X%3=1)wP(X%3=2)G_{X}(w) = 1 - w^2 P(X \% 3 = 1) - w P(X \% 3 =2), the pgf is known at w1,w2w_1, w_2, so you can compute the probabilities.

Q7. Use GX+Y(s)=GX(s)GY(s)G_{X+Y}(s) = G_{X}(s) G_{Y}(s). Then use bayesien theorem to prove the condinitonal probability is binomial distribution.

Q8. Skipped

Q9. Coupon-collecting problem, geometric distributions, pgf is known.

Q10. derivative of pgf evaluated at 1 is mean - directly from derivative of infinite sum series. for second prove need to use ϕ(1)=1\phi(1) = 1.

  • aa+bab\dfrac{a}{a+b-ab} two conditional probabilities from A and B points of view.
  • distribution is something like geometric
  • condition expectation. eA=(1b)eB+1e_{A} = (1-b)e_{B} + 1, similarly for BB. Answer 2aa+bab\dfrac{2-a}{a+b-ab}

Q11.

  • a) F=i=1NIiF = \sum_{i = 1}^{N} I_{i} sum of independent Bernoulli. GF(s)=GN(GI(s))G_F(s) = G_N(G_I(s))
  • b) directly from pgf definition, pgf denies distribution uniquely
  • c) GN(s)=GF+S(s)=GF(s)GS(s)=GN(s+12)2G_N(s) = G_{F+S}(s) = G_{F}(s) G_{S}(s) = G_{N}(\dfrac{s+1}{2})^2 last eq come from a).

Chapter 5. Continuous random variables

Recall that random variable is a function which maps Ω\Omega the outcome space to R\R. Discrete random variables have Im(X)Im(X) to be countable set.

Definition 5.2. The cumulative distribution function (cdf) of a random variable XX is the function FX:R[0,1]F_X : \R → [0, 1] defined by FX(x)=P(Xx)F_X(x) = P(X \leq x).

FXF_X is non-decreasing.

CTS random variable:

cts_rv_defn.png

WARNING: fX(x)f_X(x) IS NOT A PROBABILITY.

For cts random variable P(X=x)=0P(X = x) = 0 for any xRx \in \R

expectation_cts.png

Independece in cts random variables is defined through the pdf:

independence_cts.png

But can be defined through cdf too.

var(X+Y)=var(X)+var(Y)+2cov(X,Y)var(X + Y) = var(X) + var(Y) + 2cov(X,Y)

NB Standard normal random variables XX and YY are independent if and only if their covariance is 0. This is a nice property of normal random variables which is not true for general random variables, as we have already observed in the discrete case.

Problems

Answers on problems from chapter 5.8 in the book.

Q1. E(X)=0E(X) = 0 (substitution + symmetry), var(X)=2c2var(X) = 2c^{-2} - integration by parts + substitution.

Q2. for wNw \in \N for Poisson XX we have P(Xw)=1Γ(w,λ)Γ(w)P(X \geq w) = 1 - \dfrac{\Gamma(w,\lambda)}{\Gamma(w)}, see

The number of occurrences before time λ is at least w if and only if the waiting time until the wth occurrence is less than λ.

Q3. Skipped

Q4. trick P(Xx)=P(Xx)P(Xx)P(|X| \leq x) = P(X \leq x) - P(X \leq -x)

E(Y)=2πE(Y) = \sqrt{\dfrac{2}{\pi}}

var(Y)=12πvar(Y) = 1 - \dfrac{2}{\pi}, integrate by parts, use pdf integrates to 1

Q5. shows how to sample uniform distribution

Q6. sample any other distribution from uniform distribution

Q7. Integration question ,need to swap integrands (use Tonelli's theorem)trick

Q8. skipped, basic principles

Q9. use indicator function?

Q10. cdf is 1ey+21y1- e^{-\dfrac{y+2}{1-y}}, take derivative to find pdf

Q11. integrate arctan(x)arctan(x) from zero to inf to get the constant, (2/π)arctan(x)(2/\pi)arctan(x)

Q12. need to define pdf of the angle theta and the distance from center needle to the strips. (can simulate π\pi from this problem.) wiki

Q13. P(break stick into n pieces and have polygon)=P(all pieces are less than12)=1P(all points lie in one semi circle)=1n2n1P(\text{break stick into n pieces and have polygon}) = P(\text{all pieces are less than} \dfrac{1}{2}) = 1 - P(\text{all points lie in one semi circle}) = 1- \dfrac{n}{2^{n-1}}

Q14. cdf is y3+y\dfrac{y}{3+y}

Chapter 6. Random samples and the weak law of large numbers

Definition 6.1. Let X1,X2,...,XnX_1,X_2 , ... , X_n denote i.i.d. random variables. Then these random variables are said to constitute a random sample of size nn from the distribution.

Let X1,X2,...,XnX_1,X_2 , ... , X_n denote i.i.d. random variables with mean μ\mu and variance σ2\sigma^2. Then Xn=i=1nXi/nX_n = \sum_{i=1}^{n}X_i/n has E(Xn)=μE(X_n)= \mu and var(Xn)=σ2nvar(X_n) = \dfrac{\sigma^2}{n}.

wlln.png

Theorem 6.6. Markov's inequality gives upper bound for a random variable to be larger than certain threshhold.

Theorem 6.7. Chebyshev's inequality tells you how far from the mean a random variable can deviate.

Distributions

discrete_distributions.png

cts_distributions.png

Problem sheets answers

Sheet 1

Q1. 11!2!,210!2!,6!2!\dfrac{11!}{2!}, 2\dfrac{10!}{2!}, \dfrac{6!}{2!}

Q2. (100!)2(100!)^{2}

Q3. 169\dfrac{1}{6^9}

Q4. (kr){k\choose r}, then condition on largest element.

Q5. basic set theory holds for probability of events

Q6. a) P(BCAc)P(B \cup C \cap A^{c}). b)141

Q7. a) 23, 365...(365n+1)365n\dfrac{365...(365-n+1)}{365^n} b) 1(364365)n1 - (\dfrac{364}{365})^{n}

Q8. Inclusion exclusion principle P(no coorect hook)=P(i=1nAi)=112!+13!...P(\text{no coorect hook}) = P(\cup_{i=1}^{n} A_{i}) = 1 - \dfrac{1}{2!} + \dfrac{1}{3!} ...

number of arrangements of no correct hook = n!×P(no correct hooks)n! \times P(\text{no correct hooks})

d) P(exactly r correct hooks)=arrangements with exacly r correctall arrangements=(nr)(nr)!knr(1)kkn!P(\text{exactly r correct hooks}) = \dfrac{\text{arrangements with exacly r correct}}{\text{all arrangements}} = \dfrac{{n\choose r}(n-r)!\sum_{k}^{n-r}\frac{(-1)^k}{k}}{n!}

Remark. probability no correct hooks converges to 1e11-e^{-1}

Sheet 2 Q1. Example that pairwice independence does not imply independence

Q2. 9595+2×99.5 32\dfrac{95}{95+2 \times 99.5} ~ 32%

Q3.

  • a) mm+n\dfrac{m}{m+n}
  • b) mm+nm1(m+n1)+nm+nmm+n1\dfrac{m}{m+n}\dfrac{m-1}{(m+n-1)} + \dfrac{n}{m+n}\dfrac{m}{m+n-1}
  • c) Bayes from a) and b)

Q4 Use Bayes. Clarification: pp proportion are conservative, 1p1-p are liberal.

Q5.

  • a) (2613)1226{26\choose 13}\dfrac{1}{2^{26}}

  • b) (2613)2{26\choose 13}^2 // (5226){52\choose 26}

Second is bigger. Intution is that in a) you reset every time, and in b) you are pulled towards balance.

Q6. Euler's formula for Riemann's zeta function.

  • a) 1ks\dfrac{1}{k^{s}}
  • b) directly from definition P(AB)=P(A)P(B)P(AB) = P(A)P(B), but for infinite number of events (pairwise dependence is not enough)
  • c) if an integer is not divisibel by any prime number it muist be 1!

Sheet 3

Q1. npnp

Q2. proof by visualization

Q3.

  • a) (1p)k(1-p)^k
  • b) memoryless propery of geometric distribution

Q4. Complete th exponent and use sum of pmf is 1.

Q5.

  • a) for large nn binomial becomes poisson
  • b) eλe^{-\lambda}

Q6. 6

Q7. Coupon collector problem.

  • a) n1n\dfrac{n-1}{n}, geometric(n1n\dfrac{n-1}{n})

  • b) Geo(nk+1n\dfrac{n-k+1}{n})

  • c) harmonic series

Sheet 4

Q1. n+12,(n+1)(4n+1)12\dfrac{n+1}{2}, \dfrac{(n+1)(4n+1)}{12}

Q2. zero covariance does not imply independence

Q3.

  • a) independent so can just multiply
  • b) sum of poisson independen is poisson (derive from pgf)
  • c) binomial with p=λλ+μp = \dfrac{\lambda}{\lambda + \mu}
  • use c

Q4.

  • bayes, in numerator P(X=k,Y=n+1k)P(X=k,Y=n+1-k)
  • b) (1p)2k(2pp2)(1-p)^{2k}(2p-p^2), use $pk = p{\ge k} - p_{\ge k+

Q5.

  • n×eλn\times e^{-\lambda}
  • P(Bin(m,p)=k)P(Bin(m,p) = k), sum of binomial times poisson which turns out to be Po(λp)Po(\lambda p)

Q6. from axioms?

Q7. first and second order difference equations. solvable using particular and homogeneous solutions. For particular solutions - strategy try next most complex thing.

Sheet 5

Q1. 2qn+pn=1,qn=pn1+qn122q_n+p_n = 1, q_n = \dfrac{p_{n-1}+q_{n-1}}{2} 13\dfrac{1}{3} first order recurence equation.

Q2. 19

Q3.

  • a) nn
  • b) n+12n\dfrac{n+1}{2n}
  • c) M2\dfrac{M}{2}, en=n(Mn)e_n = n(M-n)

Q4.

  • a) sum of geometric series
  • b) Take derivatives and evaluate at 1 E(X)=pX(1)=1pE(X) = p'_{X}(1) = \dfrac{1}{p}. var(X)=1pp2var(X) = \dfrac{1-p}{p^2}

Q5.

  • a) condition on first two throws, second order recurence.
  • b) last 3 tosses are fixed, 18rn3\dfrac{1}{8}r_{n-3}
  • c) take derivative of probability generating function
  • d) hmmmm #TODO, maybe need two probabilities pn,qnp_n, q_n and use trick like in Q1?

Q6.

  • a) N(N1)2\dfrac{N(N-1)}{2}, need to use e1e_1 from Q3.
  • b) always need to go the whole way around, by symmetry the answer is 1N1\dfrac{1}{N-1}

Sheet 6

Q1.

  • a) sas^a
  • b) snpY(sm)s^n p_{Y}(s^m)

Q2.

  • a) negative binomial - number of trials up to and including the kk-th success.
  • b) negative binomial is sum of geometric distributions! pGeo(s)m=(ps1(1p)s)mp_{Geo}(s)^m = (\dfrac{ps}{1-(1-p)s})^m

Q3.

  • a) law of total expectation, or use Theorem 4.8 GZ(s)=GN(GX(s))G_Z(s)=G_N(G_X(s)) and play around with pgf derivatives evaluated at 1.
  • b) λp\lambda p
  • c) no, E[Z]E[Z] is not longer E[N]E[X1]E[N]E[X_1], your proof in a) using law of total expecation uses the independence

Q4. GX(1)+12\dfrac{G_X(-1)+1}{2}

Q5. G(s)=1/12+2s/3+s2/4G(s) = 1/12 + 2s/3 + s^2/4, G(G(s))G(G(s)), P(extinctionafter2minutes)=G(G(0))P(extinction after 2 minutes)=G(G(0))

Q6.

  • a) 2p2p, G(s)=ps2+1pG(s) = ps^2+1-p
  • b) 1pp\dfrac{1-p}{p}
  • c) #TODO, maybe need to revisit notes, or look Miro's solution

Sheet 7

Q1. pictures of distributions

Q2. check if the integral of these functions exists

Q3.

  • a) 12\dfrac{1}{2}, 112\dfrac{1}{12}
  • b) ab\dfrac{a}{b}

Q4.

  • a) eλxe^{-\lambda x}
  • b) eλaeλbe^{-\lambda a} - e^{-\lambda b}
  • c) Bayes theorem + a) memoryless property of exponential distribution
  • d) take arcsin and use that XX is exponential
  • e) exponential, with lower rate
  • f) exponential distribution is the continuous version of the geometric distribution, capped exp is geometric. proof

Q5. substract mean, divide by std to get standard normal

  • a) 0.454
  • b) 0.921 - 0.454
  • c) 0.92120+20×0.92119×(10.921)0.921^20 + 20 \times 0.921^19 \times(1-0.921)

Q6.

  • dont need pdf to get expecation and variance see
  • to get pdf just take derivate of cdf, xπ\sqrt\dfrac{x}{\pi}

Q7.

  • a) FX(X)F_X(X) is uniform distribution, just go through definition FX(FX1(x))=xF_X(F_{X}^{-1}(x)) = x
  • b) FX(x)F_X(x) - gi>ves a way to generate XX from uniform distribution UU
  • c) thats exponential distribution, get the cdf, find its inverse and apply b)

Sheet 8

Q1. Integrate, choose constants which would make the integrals equal to 1. for independence check with marginal distributions fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y) = f_{X}(x)f_{Y}(y)

Q2

  • a) P(min(Xi)>t)=iP(Xi>t) Exp(iλi)P(min(X_i)>t) = \prod_{i}P(X_i>t) ~ Exp(\sum_{i}\lambda_{i})
  • b) treat using indicator variable iP(Ti>1)=ieλi\sum_{i}P(T_i > 1) = \sum_{i}e^{-\lambda_{i}}
  • c) P(M<median)=0.5P(M < median) = 0.5

Q3. integrate x2x^2 from 0 to 1. Area under the shaded region..

Q4. chebyshev

Q5. chebyshev

Q6.

  • a) 14\dfrac{1}{4}, 316\dfrac{3}{16}
  • b) 0 when ij>1|i-j| > 1, 116\dfrac{1}{16} otherwise
  • c) use b, n4\dfrac{n}{4}, 7n16\dfrac{7n}{16}
  • d) n4\dfrac{n}{4}, 3n16\dfrac{3n}{16}

Q7 .WLLN