4 ei 1.2.6 Ableitungsregeln (∀λ, µ ∈ R) Linearität: (λf + µg)′ (x) = λf ′ (x) + µg ′ (x0 ) Produkt: (f · g)′ (x) = f ′ (x)g(x) + f (x)g ′ (x) ′ g(x)f ′ (x)−f (x)g ′ (x) f NAZ−ZAN (x) = Quotient: 2 g g(x) N2 ′ ′ ′ Kettenregel f g(x) = f g(x) g (x) Statistical Signal Processing * * kann Spuren von Katzen enthalten nicht für Humorallergiker geeignet alle Angaben ohne Gewehr i.i.d: independently identically distributed 1.3. Integrale 1. Math √ √ P Mittelwerte ( 1 Px xar = N i ∞ P z n = ez n! n=0 Exponentialreihe n n+1 P q k = 1−q 1−q k=0 Geometrische Summenformel k=1 Aritmetrische Summenformel von i bis N ) (Median: Mitte einer geordneten Liste) pQ ≥ xgeo = N xi ≥ xhm = PN1 Harmonisches Geometrisches Mittel Arithmetisches xi n Ungleichungen: Bernoulli-Ungleichung: ≥ 1 + nx (1 + x) ⊤ |x| − |y| ≤ |x ± y| ≤ |x| + |y| x · y ≤ ∥x∥ · ∥y∥ Dreiecksungleichung Cauchy-Schwarz-Ungleichung · B =A⊎B Mengen: De Morgan: A ∩ ´ ´ uw′ = uw − u′ w ´ ´ ′ f (g(x))g (x) dx = f (t) dt A ⊎ B = A∩ · B xq √ √ ax3 3 x ln(ax) − x eax (ax − 1) n→∞ ax = ex ln a ln(xa ) = a ln(x) 1+ x n n e ≈ 2, 71828 x loga x = ln ln a ) = ln x − ln a ln( x a ln x ≤ x − 1 log(1) = 0 (A · B )⊤ = B ⊤ · A⊤ (A + B )⊤ = A⊤ + B ⊤ e e e e e e e −1 ⊤e (A⊤ ) = (A−1 ) (A · B )−1 = B −1 A−1 e e e e e e dim K = n = rang A + dim ker A rang A = rang A⊤ e en×n e e 1.2.1 Quadratische Matrizen A ∈ K regulär/invertierbar/nicht-singulär ⇔ det(A) ̸= 0 ⇔ rang A = n singulär/nicht-invertierbar ⇔ det(A) = 0e⇔ rang A ̸= n e e e orthogonal ⇔ A⊤ = A−1 ⇒ det(A) = ±1 e e ⊤e symmetrisch: A = A schiefsymmetrisch: A = −A⊤ e e e e n×n 1.2.2" Determinante von A ∈ K : det(A) = |A| # " # e e A 0 Ae B det e e = det e e = det(A) det(D ) C D 0 D e f e f e f T det(A) = det(A ) det(A−1 ) = det(A)−1 e e e e det(AB ) = det(A) det(B ) = det(B ) det(A) = det(B A) e 2e linear abhäng. e e e |A| =e 0 e e Hat A Zeilen/Spalten ⇒ e e e ln(ax) x · eax e √a 2 ax 1 x ax (ax + 1) − cos(x) cosh(x) − ln | cos(x)| sin(x) sinh(x) tan(x) cos(x) cosh(x) 1.2.3 Eigenwerte (EW) λ und Eigenvektoren (EV) v Av = λv e det A = e Q λi Sp A = e P aii = P λi Eigenwerte: det(A − λ1) = 0 Eigenvektoren: ker(A − λi 1) = v i e e Matrizen sind die Elem. dereHauptdiagonale. e EW von Dreieck/Diagonal 1.2.4 Spezialfall 2 × 2 Matrix A " #−1 " # det(A) = ad − bc a b d −b = det1 A e Sp(A) = a + d c d −c a e e s Sp A spA 2 λ1/2 = 2 e ± − det A 2e e 1.2.5 Differentiation Abk. pdf fX (x) = Kumulative Verteilungsfkt. cdf FX (x) = ∂x⊤ y ∂y ⊤ x = ∂x ∂x ⊤ ∂x Ay = xy ⊤ ∂Ae e =y ∂x⊤ Ax ∂xe ∂ det(B AC ) eee ∂A e = (A + A⊤ )x e e = det(B AC ) A e e e e −1 Homepage: www.latex4ei.de – Please report mistakes immediately. −2 −1 0 x 1 2 3 1 fX (x) = √ 2πσ 2 5 4 −5 (x−µ)2 − 2σ 2 e E(X ) = µ Var(X ) = σ 2 Erwartungswert Varianz −3 −4 −2 −1 0 x 1 3 2 5 4 µ∈R x∈R σ>0 2 2 jωµ− ω σ 2 φX (ω) = e Charakt. Funktion Exponential: f (x, λ) = λe−λx E[X ] = λ−1 Var[X ] = λ−2 4. Wichtige Parameter Joint PDF | Mit Reihenfolge Reihenfolge egal nk n+k−1 k n k n! Permutation von n mit jeweils k gleichen Elementen: k !·k 1 2 !·... n n! Binomialkoeffizient n = = k n−k k!·(n−k)! n n 4 5 6 =1 =n =6 = 10 = 15 0 1 2 2 2 2.2. Der Wahrscheinlichkeitsraum (Ω, F, P) ˆ∞ {z } Ereignisalgebra Ereignis Ai ⊆ Ω Wahrscheinlichkeitsmaß P : F → [0, 1] P(A) = |Ω| Ergebnis ωj ∈ Ω {z | Marginalization } Total Probability x∈Ω′ FX |A (x|A) = P X ≤ x |A FX | Y (x|y) = P X ≤ x | Y = y pX | Y (x|y) = fX | Y (x|y) = pX ,Y (x,y) pY (y) fX ,Y (x,y) fY (y) = dFX|Y (x|y) dx X 1 ,···, X n sind stochastisch unabhängig, wenn für jedes x ∈ Rn gilt: n Q FX 1 ,···,X n (x1 ,···, xn ) = FX (xi ) fX 1 ,···,X n (x1 ,···, xn ) = i=1 n Q i=1 i i i=1 i=1 Standard Abweichung: σ = j̸=i p Var[X ] Cov[X , Y ] = E[(X − E[X ])(Y − E[Y ])⊤ ] = = E[X Y ⊤ ] − E[X ] E[Y ]⊤ = Cov[Y , X ] Cov[α X +β, γ Y +δ] = αγ Cov[X , Y ] Cov[X +U, Y +V] = Cov[X , Y ] + Cov[X , V] + Cov[U, Y ] + Cov[U, V] 3.1. Binomialverteilung B(n, p) mit p ∈ [0, 1], n ∈ N E[X ] = np Var[X ] = np(1 − p) Erwartungswert Varianz from LaTeX4EI – Mail: [email protected] Var[α X +β] = α2 Var[X ] Var[X ] = Cov[X , X ] " # n n P P P Var Xi = Var[X i ] + Cov[X i , X j ] Maß für den linearen Zusammenhang zweier Variablen Folge von n Bernoulli-Experimenten p: Wahrscheinlichkeit für Erfolg k: Anzahl der Erfolge (n k p (1 − p)n−k k ∈ {0, . . . , n} k pX (k) = Bn,p (k) = 0 sonst wenn Ai ∩ Aj = ∅, ∀i ̸= j ist ein Maß für die Stärke der Abweichung vom Erwartungswert 2 σX = Var[X] = E (X − E[X ])2 = E[X 2 ] − E[X ]2 4.3. Kovarianz 3. Common Distributions P(A ∪ B) = P(A) + P(B) − P(A ∩ B) R pX (xi ) fX (xi ) i |A| 2.3. Wahrscheinlichkeitsmaß P ´ x · fX (x) dx R stetige X :Ω→R 4.2. Varianz (2. zentrales Moment) 2.9. Unabhängigkeit von Zufallsvariablen i=1 n Q ∧ = 4.1.1 Für Funktionen von Zufallsvariablen g(x) ´ P ∧ E[g(X )] = g(x) PX (x) = g(x)fX (x) dx 2.8. Bedingte Zufallsvariablen Ereignis A gegeben: ZV Y gegeben: P x · PX (x) x∈Ω′ diskrete X :Ω→Ω′ E[α X +β Y ] = α E[X ] + β E[Y ] X ≤ Y ⇒ E[X ] ≤ E[Y ] E[X 2 ] = Var[X] + E[X]2 E[X Y ] = E[X ] E[Y ], falls X und Y stochastisch unabhängig Umkehrung nicht möglichich: Unkorrelliertheit ̸⇒ Stoch. Unabhängig! fX | Y (x, ξ)fY (ξ) dξ = fX (x) −∞ pX 1 ,···,X n (x1 ,···, xn ) = ω1 , ω2 , ... F = A1 , A2 , ... µX = E[X ] = fX ,Y (x, y) = fX | Y (x, y)fY (y) = fY | X (y, x)fX (x) fX ,Y (x, ξ) dξ = Mögliche Variationen/Kombinationen um k Elemente von maximal n Elementen zu wählen bzw. k Elemente auf n Felder zu verteilen: i=1 −3 gibt den mittleren Wert einer Zufallsvariablen an 2.1. Kombinatorik i=1 ⊤ −4 4.1. Erwartungswert (1. zentrales Moment) 2.7. Relations between fX (x), fX,Y (x, y), fX | Y (x|y) 2. Probability Theory Basics 2.3.1 Axiome von Kolmogorow Nichtnegativität: P(A) ≥ 0 ⇒ P : F 7→ [0, 1] Normiertheit: P(Ω) = 1 ! ∞ ∞ S P Additivität: P Ai = P(Ai ), -1 Gammadistribution Γ(α, β): E[X ] = α β dFX (x) dx x́ fX (ξ) dξ −∞ Joint CDF: FX ,Y (x, y) = P({X ≤ x, Y ≤ y}) ˆ∞ P(A) = -2 0.0 −5 Zusammenhang Wahrscheinlichkeitsdichte 1.3.1 Volumen und Oberfläche von Rotationskörpern um x-Achse p ´ ´ V = π ab f (x)2 dx O = 2π ab f (x) 1 + f ′ (x)2 dx |A| |Ω| -3 0.2 3.3. Sonstiges Bezeichnung 1 cos2 (x) Ω= 2 2.6. Distribution ´ at a sin(bt)+b cos(bt) e sin(bt) dt = eat a2 +b2 √ ´ 2 at ´ (ax−1)2 +1 at dt 2 at+b √ t e dt = e = a at+b a3 ´ ´ at 2 at 1 eax2 xeax dx = 2a te dt = at−1 2 e Ergebnismenge 0.4 -1 0.0 X : Ω 7→ Ω′ ist Zufallsvariable, wenn für jedes Ereignis A′ ∈ F′ im Bildraum ein Ereignis A im Urbildraum F existiert, sodass ω ∈ Ω| X (ω) ∈ A′ ∈ F ax ln(a) n! (n−k)! -2 0.2 P(A|Bk ) P(Bk ) P P(A|Bi ) P(Bi ) i∈I Φμ,σ (x) 0.6 -3 WDF: ax Ohne Wiederholung 0.4 μ = 0, σ 2 = 0.2, μ = 0, σ 2 = 1.0, μ = 0, σ 2 = 5.0, μ = −2, σ 2 = 0.5, 0.8 0.6 P(A|Bi ) P(Bi ) 2.5. Zufallsvariable ax ln(a) Mit Wiederholung P P(Bk |A) = −∞ 1.2. Matrizen A ∈ Km×n e A = (aij ) ∈ Km×n hat m Zeilen (Index i) und n Spalten (Index j) 0.8 i∈I P(A) = 1.0 μ = 0, σ 2 = 0.2, μ = 0, σ 2 = 1.0, μ = 0, σ 2 = 5.0, μ = −2, σ 2 = 0.5, Multiplikationssatz: P(A ∩ B) = P(A|B) P(B) = P(B|A) P(A) qxq−1 ax KVF/CDF: φμ,σ (x) 2 2.4.1 Totale Wahrscheinlichkeit und Satz von Bayes S Es muss gelten: Bi = Ω für Bi ∩ Bj = ∅, ∀i = ̸ j Satz von Bayes: a 1.1. Exp. und Log. ex := lim 1.0 f ′ (x) f (x) 1 xq+1 q+1 1 a2 WDF/PDF: PB (A) = P(A|B) = P(A∩B) P(B) i∈I ex dx = ex = (ex )′ F (x) − C 2 3.2. Normalverteilung Bedingte Wahrscheinlichkeit für A falls B bereits eingetreten ist: Totale Wahrscheinlichkeit: Partielle Integration: Substitution: π ≈ 3.141 59 e ≈ 2.718 28 2 ≈ 1.414 3 ≈ 1.732 Binome, Trinome 2 2 2 2 2 (a ± b) = a ± 2ab + b a − b = (a − b)(a + b) (a ± b)3 = a3 ± 3a2 b + 3ab2 ± b3 (a + b + c)2 = a2 + b2 + c2 + 2ab + 2ac + 2bc Folgen und Reihen n P n(n+1) k= 2 ´ 2.4. Bedingte Wahrscheinlichkeit GX (z) = (pz + 1 − p) 4.3.1 Korrelation = standardisierte Kovarianz C Cov[X ,Y ] ρ(X , Y ) = √ = σ x,y ρ(X , Y ) ∈ [−1; 1] ·σ Var[X ]·Var[Y ] n Wahrscheinlichkeitserz. Funktion x y ⊤ 4.3.2 Kovarianzmatrix für z = (x, " # y) " CX CX Y Cov[X , X ] Cov[z] = C z = = CX Y CY Cov[Y , X ] e Cov[X , Y ] Cov[Y , Y ] # Immer symmetrisch: Cxy = Cyx ! Für Matrizen: C xy = C ⊤ yx e e Last revised: August 4, 2016 1/3 5. Estimation 5.5. Uniformly Minimum Variance Unbiased (UMVU) Estimators (Best unbiased estimators) 5.1. Estimation Statistic Estimation treats the problem of inferring underlying characteristics of unknown random variables on the basis of observations of outputs of those random variables. Sample Space Ω Sigma Algebra F ⊆ 2Ω Probability P : F 7→ [0, 1] Random Variable X : Ω 7→ X Observations: x1 , . . . , xN Observation Space X Unknown parameter θ ∈ Θ Estimator T : X 7→ Θ unknown parm. θ R.V. of param. Θ nonempty set of outputs of experiment set of subsets of outputs (events) estimation of param. θ̂ estim. of R.V. of parm T(X ) = Θ̂ 5.2. Quality Properties of Estimators Consistent: If lim T (x1 , . . . , xN ) = θ N →∞ Bias Bias(T ) := E[T (X 1 , . . . , X N )] − θ unbiased if Bias(T ) = 0 (biased estimators can provide better estimates than unbiased estimators.) h i ∂ ∂θ ∂ L(x,θ) ∂θ L(x,θ) log L(x, θ) = E[g(x, θ)] = 0 Fischer Information: IF (θ) := Var[g(X , θ)] = E[g(x, θ)2 ] = − E ∂2 ∂θ 2 log L(X , θ) Estimation: ŷ = x⊤ t + m Cramér-Rao Lower Bound (CRB): (if T is unbiased) ∂ E[T (X )] 2 1 Var[T (X )] ≥ I 1(θ) Var[T (X )] ≥ ∂θ IF (θ) F (N ) (1) Some Derivations: (check in exam) Uniformly: Not diffable ⇒ no IF (θ) Normal N (θ, σ 2 ): g(x, θ) = Binomial B(θ, K): g(x, θ) = 6.1. Least Square Estimation (LSE) 5.6. Bayes Estimation (Conditional Mean) 5.3. Mean Square Error (MSE) h i The MSE is an extension of the Variance Var[T ] := E (T − E[T ])2 : A Priori information about θ is known as probability fΘ (θ; σ) with random variable Θ and parameter σ. Now the conditional pdf fX |Θ (x, θ) is used to find θ by minimizing the mean MSE instead of uniformly MSE. h i Tries to minimize the square error for linear Model: ŷLS = x⊤ tLS " # N P 2 Least Square Error: min (yi − x⊤ = min y − X t i t) t f i=1 Mean MSE for Θ: E E[(T(X ) − Θ)2 |Θ = θ] h i MSE: ε[T ] = E (T −θ)2 = Var(T ) + (Bias[T ])2 =E[(θ̂−θ)2 ] If Θ is also r.v. ⇒ mean over both (e.g. h h Bayes est.): ii Mean MSE: E[(T (X ) − Θ)2 ] = E E (T (X ) − Θ)2 |Θ = θ 5.3.1 Minimum Mean Square Error (MMSE) h i Minimizes mean square error: arg min E (θ̂ − θ)2 θ̂ h i E (θ̂ − θ)2 = E[θ 2 ] − 2θ̂ E[θ] + θ̂ 2 h i ! d E (θ̂ − θ)2 = 0 = −2 E[θ] + 2θ̂ Solution: dθ̂ ⇒ θ̂MMSE = E[θ] 5.4. Maximum Likelihood Given model {X, F, Pθ ; θ ∈ Θ}, assume Pθ (x) or fX (x, θ) for observed data x. Estimate parameter θ so that the likelihood L(x, θ) or L(θ| X = x) to obtain x is maximized. Likelihood Function: (Prob. for θ given x) Discrete: L(x1 , . . . , xN ; θ) = Pθ (x1 , . . . , xN ) Continuous: L(x1 , . . . , xN ; θ) = fX 1 ,...,X (x1 , . . . , xN , θ) N If N observations are Identically Independently Distributed (i.i.d.): N N Q Q Pθ (xi ) = fX (xi ) L(x, θ) = i=1 i ML Estimator (Picks θ): T ML : X 7→ argmax{L(X , θ)} = θ∈Θ i.i.d. log L(xi , θ) θ∈Θ θ∈Θ ! ∂L(x,θ) d Find Maximum: = dθ log L(x; θ) =0 ∂θ = argmax{log L(X , θ)} = argmax ŷ = x⊤ t Sometimes in Exams: ŷ = x⊤ t ⇔ x̂ = T ⊤ y e estimate x given y and unknown T e (x−θ) IF (θ) = 12 σ2 σ K x − K−x IF (θ) = θ(1−θ) θ 1−θ Variance Var[T ] := E (T − E[T ])2 i=1 or Given: N observations (yi , xi ), unknown parameters t, noise m ⊤ y1 x1 y = :̇ X = :̇ Note: ŷ ̸= y!! f x⊤ yn n Problem: Estimate y based on given (known) observations x and unknown parameter t with assumed linear Model: ŷ =!x⊤ t ! ′ x t Note y = x⊤ t + m → y = x ⊤ t′ with x′ = , t′ = 1 m (x, θ) = N · IF (x, θ) h(x) exp a(θ)t(x) ∂a(θ) ∂E[t(X )] then IF (θ) = ∂θ If fX (x) = ∂θ exp(b(θ)) For N i.i.d. observations: IF 5.5.1 Exponential Models 6.3. Matched Filter Estimator (MF) t is now the unknown parameter θ, we want to estimate y and x is the input vector... review regression problem y = Ax (we e solve for x), here we solve for t, because x is known (measured)! Confusing... 1. Training → 2. Estimation Training: We observe y and x (knowing both) and then based on that we try to estimate y given x (only observe x) with a linear model ŷ = x⊤ t Best unbiased estimator: Lowest Variance of all estimators. Fisher’s Information Inequality: Estimate lower bound of variance if • L(x, θ) > 0, ∀x, θ • L(x, θ) is diffable for θ ´ ∂ ∂ ´ L(x, θ) dx • X ∂θ L(x, θ) dx = ∂θ X Score Function: g(x, θ) = mapped subsets of Ω single values of X possible observations of X parameter of propability function T (X ) = θ̂, finds θ̂ from X 6. Linear Estimation P θ=θ̂ Solve for θ to obtain ML estimator function θ̂ML Check quality of estimator with MSE Maximum-Likelihood Estimator is Asymptotically Efficient. However, there might be not enough samples and the likelihood function is often not known. Homepage: www.latex4ei.de – Please report mistakes immediately. tLS = (X ⊤ X )−1 X ⊤ y f f f Conditional Mean Estimator: ´ TCM : x 7→ E[Θ| X = x] = Θ θ · fΘ| X (θ|x) dθ fX |Θ (x)fθ (θ) fX |θ (x)fθ (θ) Posterior fΘ|X (θ|x) = ´ f = fX (x) (x,ξ) dξ Θ X ,ξ Hint: to calculate fΘ|X (θ|x): Replace every factor not containing θ, such as f 1(x) with a factor γ and determine γ at the end such that X ´ Θ fΘ|X (θ|x) dθ = 1 MMSE: E[Var[X |Θ = θ]] ⇒ θ∈Θ 5.7. Example: Estimate mean θ of X with prior knowledge θ ∈ Θ ∼ N : 2 2 X ∼ N (θ, σX |Θ=θ ) and Θ ∼ N (m, σΘ ) σ2 +N σ 2 Θ X |Θ=θ θ̂ML + 1 Px For N independent observations xi : θ̂ML = N i Large N ⇒ ML better, small N ⇒ CM better In the lecture (estimate h): h i E ĥH h 2 h i T MF = max T tr Var[T n] e e ĥMF = T MF y T MF ∝ C h S H C −1 n e e e e e 6.4. Example η sn yn h K M = H sn + ηn f with H = (hm,k ) ∈ CM ×K (m ∈ [1, M ], k ∈ [1, K]) LinearfChannel Model y = S h + n with e h ∼ N (0, C h ) and n ∼ N (0, C n ) e e System Model: y n Linear Estimator T estimates ĥ = T y ∈ CM K e H e H −1 T MMSE = C hy C −1 y = C h S (S C h S + C n ) e e e e e ee e e −1 H −1 T ML = T Cor = (S H C −1 S ) S C n n e e e e e e e T MF ∝ C h S H C −1 n e e e eH 2 2 1K×M and C n = ση 1N ×M For Assumption S S = N σs e e e e e 6.2. Linear Minimum Mean Square Estimator (LMMSE) 2 ! x If Random joint variable z = with y ! " # µ C c x xy then x and C z = e µ = z cyx cy e µy LMMSE Estimation of y given x is −1 −1 −1 ŷ = µy + cyx C x (x − µ ) = cyx C x x − µy + cyx C x µ x x e e } | {z | {z e } minimizes the MSE for all estimators θ̂CM = E[Θ|X = x] = E (t⊤ hx)2 h i E (t⊤ v)2 ŷ = X tLS ∈ span(X) LS f Orthogonality Principle: N observations xi ∈ Rd Y − X T LS ⊥ span[X ] ⇔ Y − X T LS ∈ null[X ⊤ ], thus e e e fe f fe X ⊤ (Y − X T LS ) = 0 and if N > d ∧ rang[X ] = d: e f fe f T LS = (X ⊤ X )−1 X ⊤ Y e f f f e t,m E[(TCM (X ) − Θ)h(X )] = 0 2 N σΘ tMF = max t e 2 ŷLMMSE = arg min E y − (t⊤ x + m) MMSE Estimator: θ̂MMSE = arg min MSE σ2 X |Θ=θ σ2 +N σ 2 Θ X |Θ=θ ∥hx∥ Find Filter t⊤ that maximizes SNR = ∥v∥ i) ( h Estimate y with linear estimator t, such that ŷ = t⊤ x + m Note: the Model does not need to be linear! The estimator is linear! 2 2 ⇒ σX = σX |Θ=θ + σΘ T CM : x 7→ E[Θ| X = x] = µ + C Θ,X C −1 (x −µ ) Θ X e eX MMSE: h i 2 C X ,Θ ) E ∥T CM −Θ∥2 = tr(C θ| X ) = tr(C Θ − C Θ,X C −1 X e e e e e Multivariate Gaussian: X , Θ ∼ N Orthogonality Principle: T CM (X ) − Θ ⊥ h(X ) For channel y = hx + v, Filtered: t⊤ y = t⊤ hx + t⊤ v m =t⊤ 2 Minimum MSE: E y − (x⊤ t + m) 2 =m 6.5. Estimators 2 Px Upper Bound: Uniform in [0; θ] : θ̂ML = N i x+1 x Probability p for B(p, N ): p̂ML = N p̂CM = N +2 N 1 P x Mean µ for N (µ, σ 2 ) : µ̂2 i ML = N i=1 2 1 Variance σ 2 for N (µ, σ 2 ) : σ̂ML = N N P (xi − µ)2 i=1 −1 cxy = cy − cyx Cx Hint: First calculate ŷ in general and then set variables according to system equation. −1 Multivariate: ŷ = T ⊤ T⊤ LMMSE x LMMSE = C yx C x e e e e If µ z = 0 then Estimator ŷ = cy,x C −1 x x e Minimum MSE: E[cy,x ] = cy − t⊤ cx,y from LaTeX4EI – Mail: [email protected] Last revised: August 4, 2016 2/3 7. Gaussian Stuff 8.3. Hidden Markov Chains 9.4. Particle-Filter 7.1. Gaussian Channel Problem: states X i are not visible and can only be guessed indirectly as a random variable Y i . For non linear state space and non-gaussian noise Non-linear State space: xn = gn (xn−1 , v n ) y = hn (xn−1 , wn ) Channel: Y = hsi + N with h ∼ N , N ∼ N n Q L(y1 , ..., yN ) = fY (yi , h) i i=1 1 fY (yi , h) = √ exp − 12 (yi − hsi )2 i 2πσ 2σ 2 s⊤ y ĥM L = argmin{y − hs } = ⊤ s h s If multidimensional channel: y = S h + n: e 1 L(y, h) = q exp − 1 (y − S h)⊤ C −1 (y − S h) 2 det(2πC ) e e e e l(y, h) = 1 log(det(2πC ) − (y − S h)⊤ C −1 (y − S h) 2 e e e e d (y dh − S h)⊤ C −1 (y − S h) = −2S ⊤ C −1 (x − S h) e e e e e e Gaussian Covariance: if Y ∼ N (0, σ 2 ), N ∼ N (0, σ 2 ): C Y = Cov[Y , Y ] = E[(Y −µ)(Y −µ)⊤ ] = E[Y Y ⊤ ] e For Channel Y = Sh + N: E[Y Y ⊤ ] = S E[hh⊤ ]S ⊤ + E[NN⊤ ] fX 2|X 1 fX 1 fX 3|X 2 fX n − 1|X n − 2 fX n |X n − 1 fX n +1|X n 11. Tests 11.1. Neyman-Pearson-Test n X1 St at es ...... X2 L ikel ihoods fY 1|X 1 fY 2|X 2 Obser vat ions Y1 Y2 Conditional pdf fX |Y n n State-transision pdf fX n | X X n−1 Xn fY n − 1|X n − 1 fY n |X n Y n−1 Yn ...... ...... z }| { Posterior Conditional PDF: fX n |Y n (xn |yn ) ∝ fY n |X n (yn |xn ) · ˆ · fX n |X (xn |xn−1 ) fX (xn−1 |yn−1 ) dxn−1 n−1 n−1 |Y n−1 {z }| {z } X | 1 Monte-Carlo-Integration: I = E[g(X )] ≈ IN = N n−1 ˆ fX |X · fX dxn−1 n n−1 n−1 |Yn−1 If fX n (x) dx ̸= 1 then IN = N P i Gd(θ1) N P g̃(xi ) i=1 = q 1 det(2πC x ) exp −1 2 ⊤ x−µ C −1 x−µ x x x e e Affine transformations y = Ax + b are jointly Gaussian with e y ∼ N (Aµ + b, AC x A⊤ ) e x e e e All marginal PDFs are Gaussian as well Contour Lines Ellipsoid with central point E[y] and main axis are the eigenvectors of C −1 y e 7.3. Conditional Gaussian A ∼ N (µ , C A ), B ∼ N (µ , C B ) A e B e ⇒ (A|B = b) ∼ N (µ ,C ) A|B e A|B Conditional Mean: E[A|B = b] = µ A|B=b recursively calculates the most likely state from previous state estimates and current observation. Shows optimum performance for Gauss-Markov Sequences. State space: xn = Gn xn−1 + B un + v n e nx + w e y =H n n n f With gaussian process/measurement noise v n /wn E[xn |y ] = x̂n|n Short notation: E[xn |y ] = x̂n|n−1 n−1 n n−1 ] = ŷ n|n−1 n E[y |y ] = ŷ n n n|n w̃ g(x ) Mean: x̂n|n = x̂n|n−1 + K n y − H n x̂n|n−1 n f f Covariance: C x = Cx + KnHnCx e n|n e n|n−1 f f e n|n−1 correction: E[X n |∆ Y n =yn ] z }| 7.4. Misc With optimal Kalman-gain (prediction for xn based on ∆yn ): Sequence of a random variable. Example: result of a dice is RV, roll a dice several times is a random sequence. 8.2. Markov Sequence Xn : Ω → Xn Sequence of memoryless state transitions with certain probabilities. 1. state: fX 1 (x1 ) 2. state: fX | X (x2 |x1 ) 2 1 n. state: fX n | X (xn |xn−1 ) n−1 x̂n|n−1 | {z } estimation E[X n | Y n−1 =yn−1 ] { + K n y − H n x̂n|n−1 f | n f{z } x̂n|n = 8.1. Random Sequences xα 1 R(x) > 1 0 otherwise ROC Graphs: plot Gd (θ1 ) as a function of Gd (θ0 ) innovation:∆yn ⊤ −1 Kn = Cx H⊤ n (H n C xn|n−1 H n + C wn ) e n|n−1 f e f |f e {zf } C δy e n Innovation: closeness of the estimated mean value to the real value ∆y = y − ŷ = y − H n x̂n|n−1 n n n|n−1 n f 2 Init: x̂0|−1 = E[X 0 ] σ0|−1 = Var[X 0 ] ´ MMSE Estimator: x̂ = xn fX n | Y (xn |y ) dxn (n) P(A ∩ B|E) = P(A|E) · P(B|E) dML (x) = 11.2. Bayes Test (MAP Detector) Given Y , X and Z are independent if fZ | Y ,X (z|y, x) = fZ | Y (z|y) or fX ,Z | Y (x, z|y) = fZ | Y (z|y) · fX | Y (x|y) fZ | X ,Y (z|x, y) = fZ | Y (z|y) or fX | Z ,Y (x|z, y) = fX | Y (x|y) 10. Hypothesis Testing making a decision based on the observations Null hypothesis H0 : θ ∈ Θ0 (Assumed first to be true) Alternate hypothesis H1 : θ ∈ Θ1 (The one to proof) Descision rule φ : X → [0, 1] with φ(x) = 1: decide for H1 , φ(x) = 0: decide for H0 Error level α with E[d(X )|θ] ≤ α, ∀θ ∈ Θ0 2. step: Update Conditional Variance: C A|B = C AA + C AB C −1 C BA e e e e BB e 8. Sequences x R(x) = c Maximum Likelihood Detector: i=1 10.1. Definition 1. step: Prediction Mean: x̂n|n−1 = Gn x̂n−1|n−1 e Covariance: C x = Gn C x G⊤ n + Cv e n|n−1 e e n−1|n−1 e e b−µ = µ + C AB C −1 A B e e BB If CDF of gaussian distribution given Φ(z) ∼ N (0, 1) then for X ∼ N (1, 1) the CDF is given as Φ(x − µx ) f X (x; θ1) α 9.5. Conditional Stochastical Independence 9.1. Kalman-Filter E[y |y f X (x; θ0) ( i 7.2. Multivariate Gaussian Distributions fx (x) = fx1 ,...,xn (x1 , ..., xn ) = 0 X 0 : null hypothesis X 1 : alternative qX (x ) i=1 X Errorlevel α Importance Sampling: Instead of fX (x) use Importance Density qX (x) i N 1 P w̃ i g(xi ) with weights w̃ i = fX (x ) IN = N i ´ Likelihood-Ratio: f (x;θ ) R(x) = fX (x;θ1 ) Steps: For α calculate xα , then c = R(xα ) last conditional PDF 9. Recursive Estimation A vector x of n independent Gaussian random variables xi is jointly Gaussian. If x ∼ N (µ , C x ): x e γ = P0 against P1 is R(x) > c R(x) = c R(x) < c α−P0 ({R>c}) P0 ({R=c}) i N random Particles with particle weight wn at time n Likelihood pdf fY n | X n X state transition ...... Estimation: fX |Y ∝ fY |X · n n n n likelihood The best test of 1 dNP (x) = γ 0 (n) For non linear problems: Suboptimum nonlinear Filters: Extended KF, Unscented KF, ParticleFilter 9.2. Extended Kalman (EKF) Linear approximation of non-linear g, h xn = gn (xn−1 , v n ) vn ∼ N y = hn (xn−1 , wn ) wn ∼ N Error Type Reality Decision⧹ 1 (FA) False Alarm H1 rejected True Negative (H0 accepted) P=1−α False Negative (Type 2) P=β False Positive (Type 1) P=α True Positive P=1−β H1 false (H0 true) H1 true (H0 false) Prior knowledge on possible hypotheses: P({θ ∈ Θ0 }) + P({θ ∈ Θ1 }) = 1, minimizes the probability of a wrong decision. ( 1 fX (x|θ1 ) > c0 P(θ0 |x) fX (x|θ0 ) c1 P(θ1 |x) = 1 P(θ1 |x) > P(θ0 |x) dBayes = 0 otherwise 0 otherwise Risk weights c0 , c1 are 1 by default. If P(θ0 ) = P(θ1 ), the Bayes test(is equivalent to the ML test c0 type 1 d(x) = 1, but θ = θ0 Loss Function L(d(x), θ) = c1 type 2 d(x) = 0, but θ = θ1 risk(d) = E[L(d(X ), θ)] = E[E[L(d(x), θ)|x = X ]] 0 x ∈ X0 Multiple Hypothesis dBayes = 1 x ∈ X1 2 x ∈ X2 11.3. Linear Alternative Tests ( d : X → R, x 7→ 2 (DE) H1 accepted Detection (H0 rejected) Error 1 0 w ⊤ x − w0 > 0 otherwise Estimate normal vector w⊤ , which separates X into X0 and X1 ln(det(C 0 )) ⊤ −1 1 log R(x) = ln(det(C e )) + 2 (x − µ0 ) C 0 (x − µ0 )− 1 e ⊤ −1e 1 − 2 (x − µ ) C 1 (x − µ ) = 0 1 1 e For 2 Gaussians, with C 0 = C 1 = C : w⊤ = (µ − µ )⊤ C 1 0 e e e e (µ −µ )⊤ C (µ −µ ) 1 0 1 0 and constant translation w0 = e 2 TP =1−β Power: Sensitivity/Recall/Hit Rate: TP+FN TN Specificity/True negative rate: FP+TN =1−α TP Precision/Positive Prediciton rate: TP+FP 2−α−β TP+TN Accuracy: P+N = 2 10.1.1 Design of a test Cost criterion Gφ : Θ → [0, 1], θ 7→ E[d(X)|θ] False Positive lower than α: Gd (θ)|θ∈Θ0 ≤ α, ∀θ ∈ Θ0 False Negative small as possible: max{Gd (θ)|θ∈Θ1 }, ∀θ ∈ Θ1 10.2. Sufficient Statistics Sufficiency for a test T (X ) means that no other test statistic, i.e., function of the observations x, contains additional information about the parameter θ to be estimated: fX |T (x|T (x) = t, θ) = fX |T (x|T (x) = t) n 9.3. Unscented Kalman (UKF) Approximation of desired PDF fX n |Y n (xn |yn ) by Gaussian PDF. Homepage: www.latex4ei.de – Please report mistakes immediately. from LaTeX4EI – Mail: [email protected] Last revised: August 4, 2016 3/3