Statistical Signal Processing

Werbung
4
ei
1.2.6 Ableitungsregeln (∀λ, µ ∈ R)
Linearität:
(λf + µg)′ (x) = λf ′ (x) + µg ′ (x0 )
Produkt:
(f · g)′ (x) = f ′ (x)g(x) + f (x)g ′ (x)
′
g(x)f ′ (x)−f (x)g ′ (x)
f
NAZ−ZAN
(x) =
Quotient:
2
g
g(x)
N2
′
′
′
Kettenregel
f g(x)
= f g(x) g (x)
Statistical Signal
Processing
*
* kann Spuren von Katzen enthalten
nicht für Humorallergiker geeignet
alle Angaben ohne Gewehr
i.i.d: independently identically distributed
1.3. Integrale
1. Math
√
√
P
Mittelwerte (
1 Px
xar = N
i
∞
P
z n = ez
n!
n=0
Exponentialreihe
n
n+1
P
q k = 1−q
1−q
k=0
Geometrische Summenformel
k=1
Aritmetrische Summenformel
von i bis N )
(Median: Mitte einer geordneten Liste)
pQ
≥ xgeo = N
xi
≥
xhm = PN1
Harmonisches
Geometrisches Mittel
Arithmetisches
xi
n
Ungleichungen:
Bernoulli-Ungleichung:
≥ 1 + nx
(1 + x)
⊤
|x| − |y| ≤ |x ± y| ≤ |x| + |y|
x · y ≤ ∥x∥ · ∥y∥
Dreiecksungleichung
Cauchy-Schwarz-Ungleichung
· B =A⊎B
Mengen: De Morgan: A ∩
´
´
uw′ = uw − u′ w
´
´
′
f (g(x))g (x) dx = f (t) dt
A ⊎ B = A∩
· B
xq
√
√
ax3
3
x ln(ax) − x
eax (ax − 1)
n→∞
ax = ex ln a
ln(xa ) = a ln(x)
1+
x
n
n
e ≈ 2, 71828
x
loga x = ln
ln a
) = ln x − ln a
ln( x
a
ln x ≤ x − 1
log(1) = 0
(A · B )⊤ = B ⊤ · A⊤
(A + B )⊤ = A⊤ + B ⊤
e
e e
e
e
e
e
−1
⊤e
(A⊤ )
= (A−1 )
(A · B )−1 = B −1 A−1
e
e
e e
e
e
dim K = n = rang A + dim ker A
rang A = rang A⊤
e
en×n
e
e
1.2.1 Quadratische Matrizen A ∈ K
regulär/invertierbar/nicht-singulär ⇔ det(A) ̸= 0 ⇔ rang A = n
singulär/nicht-invertierbar ⇔ det(A) = 0e⇔ rang A ̸= n e
e
e
orthogonal ⇔ A⊤ = A−1 ⇒ det(A) = ±1
e
e
⊤e
symmetrisch: A = A
schiefsymmetrisch: A = −A⊤
e
e
e
e
n×n
1.2.2" Determinante
von
A
∈
K
: det(A) = |A|
#
"
#
e
e
A 0
Ae B
det e
e = det e
e = det(A) det(D )
C D
0 D
e
f
e f
e f
T
det(A) = det(A )
det(A−1 ) = det(A)−1
e
e
e
e
det(AB ) = det(A) det(B ) = det(B ) det(A) = det(B A)
e 2e linear abhäng.
e
e
e |A| =e 0
e e
Hat A
Zeilen/Spalten
⇒
e
e
e
ln(ax)
x · eax
e
√a
2 ax
1
x
ax
(ax + 1)
− cos(x)
cosh(x)
− ln | cos(x)|
sin(x)
sinh(x)
tan(x)
cos(x)
cosh(x)
1.2.3 Eigenwerte (EW) λ und Eigenvektoren (EV) v
Av = λv
e
det A =
e
Q
λi
Sp A =
e
P
aii =
P
λi
Eigenwerte: det(A − λ1) = 0 Eigenvektoren: ker(A − λi 1) = v i
e
e Matrizen sind die Elem. dereHauptdiagonale.
e
EW von Dreieck/Diagonal
1.2.4 Spezialfall 2 × 2 Matrix A
"
#−1
"
#
det(A) = ad − bc
a b
d
−b
= det1 A
e
Sp(A) = a + d
c d
−c a
e
e
s
Sp A
spA 2
λ1/2 = 2 e ±
− det A
2e
e
1.2.5 Differentiation
Abk.
pdf
fX (x) =
Kumulative Verteilungsfkt.
cdf
FX (x) =
∂x⊤ y
∂y ⊤ x
= ∂x
∂x
⊤
∂x Ay
= xy ⊤
∂Ae
e
=y
∂x⊤ Ax
∂xe
∂ det(B AC )
eee
∂A
e
= (A + A⊤ )x
e
e
= det(B AC ) A
e e e
e
−1
Homepage: www.latex4ei.de – Please report mistakes immediately.
−2
−1
0
x
1
2
3
1
fX (x) = √
2πσ 2
5
4
−5
(x−µ)2
−
2σ 2
e
E(X ) = µ
Var(X ) = σ 2
Erwartungswert
Varianz
−3
−4
−2
−1
0
x
1
3
2
5
4
µ∈R
x∈R
σ>0
2 2
jωµ− ω σ
2
φX (ω) = e
Charakt. Funktion
Exponential: f (x, λ) = λe−λx
E[X ] = λ−1
Var[X ] = λ−2
4. Wichtige Parameter
Joint PDF
|
Mit Reihenfolge
Reihenfolge egal
nk
n+k−1
k
n
k
n!
Permutation von n mit jeweils k gleichen Elementen: k !·k
1
2 !·...
n
n!
Binomialkoeffizient n
=
=
k
n−k
k!·(n−k)!
n
n
4
5
6
=1
=n
=6
= 10
= 15
0
1
2
2
2
2.2. Der Wahrscheinlichkeitsraum (Ω, F, P)
ˆ∞
{z
}
Ereignisalgebra
Ereignis Ai ⊆ Ω
Wahrscheinlichkeitsmaß
P : F → [0, 1]
P(A) = |Ω|
Ergebnis ωj ∈ Ω
{z
|
Marginalization
}
Total Probability
x∈Ω′
FX |A (x|A) = P X ≤ x |A
FX | Y (x|y) = P X ≤ x | Y = y
pX | Y (x|y) =
fX | Y (x|y) =
pX ,Y (x,y)
pY (y)
fX ,Y (x,y)
fY (y)
=
dFX|Y (x|y)
dx
X 1 ,···, X n sind stochastisch unabhängig, wenn für jedes x ∈ Rn gilt:
n
Q
FX 1 ,···,X n (x1 ,···, xn ) =
FX (xi )
fX 1 ,···,X n (x1 ,···, xn ) =
i=1
n
Q
i=1
i
i
i=1
i=1
Standard Abweichung: σ =
j̸=i
p
Var[X ]
Cov[X , Y ] = E[(X − E[X ])(Y − E[Y ])⊤ ] =
= E[X Y ⊤ ] − E[X ] E[Y ]⊤ = Cov[Y , X ]
Cov[α X +β, γ Y +δ] = αγ Cov[X , Y ]
Cov[X +U, Y +V] = Cov[X , Y ] + Cov[X , V] + Cov[U, Y ] + Cov[U, V]
3.1. Binomialverteilung B(n, p) mit p ∈ [0, 1], n ∈ N
E[X ] = np
Var[X ] = np(1 − p)
Erwartungswert
Varianz
from LaTeX4EI – Mail: [email protected]
Var[α X +β] = α2 Var[X ]
Var[X ] = Cov[X , X ]
"
#
n
n
P
P
P
Var
Xi =
Var[X i ] +
Cov[X i , X j ]
Maß für den linearen Zusammenhang zweier Variablen
Folge von n Bernoulli-Experimenten
p: Wahrscheinlichkeit für Erfolg
k: Anzahl der Erfolge
(n
k
p
(1
−
p)n−k k ∈ {0, . . . , n}
k
pX (k) = Bn,p (k) =
0
sonst
wenn Ai ∩ Aj = ∅, ∀i ̸= j
ist ein Maß für die Stärke der Abweichung vom Erwartungswert
2
σX
= Var[X] = E (X − E[X ])2 = E[X 2 ] − E[X ]2
4.3. Kovarianz
3. Common Distributions
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
R
pX (xi )
fX (xi )
i
|A|
2.3. Wahrscheinlichkeitsmaß P
´
x · fX (x) dx
R
stetige X :Ω→R
4.2. Varianz (2. zentrales Moment)
2.9. Unabhängigkeit von Zufallsvariablen
i=1
n
Q
∧
=
4.1.1 Für Funktionen von Zufallsvariablen g(x)
´
P
∧
E[g(X )] =
g(x) PX (x) =
g(x)fX (x) dx
2.8. Bedingte Zufallsvariablen
Ereignis A gegeben:
ZV Y gegeben:
P
x · PX (x)
x∈Ω′
diskrete X :Ω→Ω′
E[α X +β Y ] = α E[X ] + β E[Y ]
X ≤ Y ⇒ E[X ] ≤ E[Y ]
E[X 2 ] = Var[X] + E[X]2
E[X Y ] = E[X ] E[Y ], falls X und Y stochastisch unabhängig
Umkehrung nicht möglichich: Unkorrelliertheit ̸⇒ Stoch. Unabhängig!
fX | Y (x, ξ)fY (ξ) dξ = fX (x)
−∞
pX 1 ,···,X n (x1 ,···, xn ) =
ω1 , ω2 , ...
F = A1 , A2 , ...
µX = E[X ] =
fX ,Y (x, y) = fX | Y (x, y)fY (y) = fY | X (y, x)fX (x)
fX ,Y (x, ξ) dξ =
Mögliche Variationen/Kombinationen um k Elemente von maximal n Elementen zu wählen bzw. k Elemente auf n Felder zu verteilen:
i=1
−3
gibt den mittleren Wert einer Zufallsvariablen an
2.1. Kombinatorik
i=1
⊤
−4
4.1. Erwartungswert (1. zentrales Moment)
2.7. Relations between fX (x), fX,Y (x, y), fX | Y (x|y)
2. Probability Theory Basics
2.3.1 Axiome von Kolmogorow
Nichtnegativität:
P(A) ≥ 0 ⇒ P : F 7→ [0, 1]
Normiertheit:
P(Ω) = 1 !
∞
∞
S
P
Additivität:
P
Ai =
P(Ai ),
-1
Gammadistribution Γ(α, β): E[X ] = α
β
dFX (x)
dx
x́
fX (ξ) dξ
−∞
Joint CDF: FX ,Y (x, y) = P({X ≤ x, Y ≤ y})
ˆ∞
P(A) =
-2
0.0
−5
Zusammenhang
Wahrscheinlichkeitsdichte
1.3.1 Volumen
und Oberfläche von Rotationskörpern
um x-Achse
p
´
´
V = π ab f (x)2 dx
O = 2π ab f (x) 1 + f ′ (x)2 dx
|A|
|Ω|
-3
0.2
3.3. Sonstiges
Bezeichnung
1
cos2 (x)
Ω=
2
2.6. Distribution
´ at
a sin(bt)+b cos(bt)
e sin(bt) dt = eat
a2 +b2
√
´ 2 at
´
(ax−1)2 +1 at
dt
2 at+b
√
t e dt =
e
=
a
at+b
a3
´
´ at
2
at
1 eax2
xeax dx = 2a
te dt = at−1
2 e
Ergebnismenge
0.4
-1
0.0
X : Ω 7→ Ω′ ist Zufallsvariable, wenn für jedes Ereignis A′ ∈ F′
im Bildraum
ein Ereignis A im Urbildraum F existiert,
sodass ω ∈ Ω| X (ω) ∈ A′ ∈ F
ax ln(a)
n!
(n−k)!
-2
0.2
P(A|Bk ) P(Bk )
P
P(A|Bi ) P(Bi )
i∈I
Φμ,σ (x)
0.6
-3
WDF:
ax
Ohne Wiederholung
0.4
μ = 0, σ 2 = 0.2,
μ = 0, σ 2 = 1.0,
μ = 0, σ 2 = 5.0,
μ = −2, σ 2 = 0.5,
0.8
0.6
P(A|Bi ) P(Bi )
2.5. Zufallsvariable
ax
ln(a)
Mit Wiederholung
P
P(Bk |A) =
−∞
1.2. Matrizen A ∈ Km×n
e
A = (aij ) ∈ Km×n hat m Zeilen (Index i) und n Spalten (Index j)
0.8
i∈I
P(A) =
1.0
μ = 0, σ 2 = 0.2,
μ = 0, σ 2 = 1.0,
μ = 0, σ 2 = 5.0,
μ = −2, σ 2 = 0.5,
Multiplikationssatz: P(A ∩ B) = P(A|B) P(B) = P(B|A) P(A)
qxq−1
ax
KVF/CDF:
φμ,σ (x)
2
2.4.1 Totale Wahrscheinlichkeit
und Satz von Bayes
S
Es muss gelten:
Bi = Ω für Bi ∩ Bj = ∅, ∀i =
̸ j
Satz von Bayes:
a
1.1. Exp. und Log. ex := lim
1.0
f ′ (x)
f (x)
1 xq+1
q+1
1
a2
WDF/PDF:
PB (A) = P(A|B) =
P(A∩B)
P(B)
i∈I
ex dx = ex = (ex )′
F (x) − C
2
3.2. Normalverteilung
Bedingte Wahrscheinlichkeit für A falls B bereits eingetreten ist:
Totale Wahrscheinlichkeit:
Partielle Integration:
Substitution:
π ≈ 3.141 59
e ≈ 2.718 28
2 ≈ 1.414
3 ≈ 1.732
Binome, Trinome
2
2
2
2
2
(a ± b) = a ± 2ab + b
a − b = (a − b)(a + b)
(a ± b)3 = a3 ± 3a2 b + 3ab2 ± b3
(a + b + c)2 = a2 + b2 + c2 + 2ab + 2ac + 2bc
Folgen und Reihen
n
P
n(n+1)
k=
2
´
2.4. Bedingte Wahrscheinlichkeit
GX (z) = (pz + 1 − p)
4.3.1 Korrelation = standardisierte Kovarianz
C
Cov[X ,Y ]
ρ(X , Y ) = √
= σ x,y
ρ(X , Y ) ∈ [−1; 1]
·σ
Var[X ]·Var[Y ]
n
Wahrscheinlichkeitserz. Funktion
x
y
⊤
4.3.2 Kovarianzmatrix
für z = (x,
"
# y)
"
CX
CX Y
Cov[X , X ]
Cov[z] = C z =
=
CX Y
CY
Cov[Y , X ]
e
Cov[X , Y ]
Cov[Y , Y ]
#
Immer symmetrisch: Cxy = Cyx ! Für Matrizen: C xy = C ⊤
yx
e
e
Last revised: August 4, 2016
1/3
5. Estimation
5.5. Uniformly Minimum Variance Unbiased (UMVU) Estimators (Best unbiased estimators)
5.1. Estimation
Statistic Estimation treats the problem of inferring underlying characteristics of unknown random variables on the basis of observations of outputs
of those random variables.
Sample Space Ω
Sigma Algebra F ⊆ 2Ω
Probability P : F 7→ [0, 1]
Random Variable X : Ω 7→ X
Observations: x1 , . . . , xN
Observation Space X
Unknown parameter θ ∈ Θ
Estimator T : X 7→ Θ
unknown parm. θ
R.V. of param. Θ
nonempty set of outputs of experiment
set of subsets of outputs (events)
estimation of param. θ̂
estim. of R.V. of parm T(X ) = Θ̂
5.2. Quality Properties of Estimators
Consistent: If
lim T (x1 , . . . , xN ) = θ
N →∞
Bias Bias(T ) := E[T (X 1 , . . . , X N )] − θ
unbiased if Bias(T ) = 0 (biased estimators can provide better estimates
than unbiased estimators.)
h
i
∂
∂θ
∂ L(x,θ)
∂θ
L(x,θ)
log L(x, θ) =
E[g(x, θ)] = 0
Fischer Information:
IF (θ) := Var[g(X , θ)] = E[g(x, θ)2 ] = − E
∂2
∂θ 2
log L(X , θ)
Estimation: ŷ = x⊤ t + m
Cramér-Rao Lower Bound (CRB): (if T is unbiased)
∂ E[T (X )] 2
1
Var[T (X )] ≥ I 1(θ)
Var[T (X )] ≥
∂θ
IF (θ)
F
(N )
(1)
Some Derivations: (check in exam)
Uniformly: Not diffable ⇒ no IF (θ)
Normal N (θ, σ 2 ): g(x, θ) =
Binomial B(θ, K): g(x, θ) =
6.1. Least Square Estimation (LSE)
5.6. Bayes Estimation (Conditional Mean)
5.3. Mean Square Error (MSE)
h
i
The MSE is an extension of the Variance Var[T ] := E (T − E[T ])2 :
A Priori information about θ is known as probability fΘ (θ; σ) with random variable Θ and parameter σ. Now the conditional pdf fX |Θ (x, θ)
is used to find θ by minimizing
the mean MSE instead
of uniformly MSE.
h
i
Tries to minimize the square error for linear Model: ŷLS = x⊤ tLS
"
#
N
P
2
Least Square Error: min
(yi − x⊤
= min y − X t
i t)
t
f
i=1
Mean MSE for Θ: E E[(T(X ) − Θ)2 |Θ = θ]
h
i
MSE: ε[T ] = E (T −θ)2 = Var(T ) + (Bias[T ])2
=E[(θ̂−θ)2 ]
If Θ is also r.v. ⇒ mean over both (e.g.
h h Bayes est.):
ii
Mean MSE: E[(T (X ) − Θ)2 ] = E E (T (X ) − Θ)2 |Θ = θ
5.3.1 Minimum Mean Square Error (MMSE)
h
i
Minimizes mean square error: arg min E (θ̂ − θ)2
θ̂
h
i
E (θ̂ − θ)2 = E[θ 2 ] − 2θ̂ E[θ] + θ̂ 2
h
i
!
d
E (θ̂ − θ)2 = 0 = −2 E[θ] + 2θ̂
Solution:
dθ̂
⇒ θ̂MMSE = E[θ]
5.4. Maximum Likelihood
Given model {X, F, Pθ ; θ ∈ Θ}, assume Pθ (x) or fX (x, θ) for
observed data x. Estimate parameter θ so that the likelihood L(x, θ)
or L(θ| X = x) to obtain x is maximized.
Likelihood Function: (Prob. for θ given x)
Discrete:
L(x1 , . . . , xN ; θ) = Pθ (x1 , . . . , xN )
Continuous:
L(x1 , . . . , xN ; θ) = fX 1 ,...,X (x1 , . . . , xN , θ)
N
If N observations are Identically Independently Distributed (i.i.d.):
N
N
Q
Q
Pθ (xi ) =
fX (xi )
L(x, θ) =
i=1
i
ML Estimator (Picks θ): T ML : X 7→ argmax{L(X , θ)} =
θ∈Θ
i.i.d.
log L(xi , θ)
θ∈Θ
θ∈Θ
!
∂L(x,θ)
d
Find Maximum:
= dθ log L(x; θ)
=0
∂θ
= argmax{log L(X , θ)} = argmax
ŷ = x⊤ t
Sometimes in Exams: ŷ = x⊤ t ⇔ x̂ = T ⊤ y
e
estimate x given y and unknown T
e
(x−θ)
IF (θ) = 12
σ2
σ
K
x − K−x
IF (θ) = θ(1−θ)
θ
1−θ
Variance Var[T ] := E (T − E[T ])2
i=1
or
Given: N observations (yi , xi ), unknown parameters t, noise m
 
 ⊤
y1
x1
 


y =  :̇  X =  :̇ 
Note: ŷ ̸= y!!
f
x⊤
yn
n
Problem: Estimate y based on given (known) observations x and
unknown parameter t with assumed linear Model: ŷ =!x⊤ t
!
′
x
t
Note y = x⊤ t + m → y = x ⊤ t′ with x′ =
, t′ =
1
m
(x, θ) = N · IF (x, θ)
h(x) exp a(θ)t(x)
∂a(θ) ∂E[t(X )]
then IF (θ) = ∂θ
If fX (x) =
∂θ
exp(b(θ))
For N i.i.d. observations: IF
5.5.1 Exponential Models
6.3. Matched Filter Estimator (MF)
t is now the unknown parameter θ, we want to estimate y and
x is the input vector... review regression problem y = Ax (we
e
solve for x), here we solve for t, because x is known
(measured)! Confusing...
1. Training → 2. Estimation
Training: We observe y and x (knowing both) and then based
on that we try to estimate y given x (only observe x) with a
linear model ŷ = x⊤ t
Best unbiased estimator: Lowest Variance of all estimators.
Fisher’s Information Inequality: Estimate lower bound of variance if
• L(x, θ) > 0, ∀x, θ
• L(x, θ) is diffable for θ
´ ∂
∂ ´ L(x, θ) dx
• X ∂θ
L(x, θ) dx = ∂θ
X
Score Function:
g(x, θ) =
mapped subsets of Ω
single values of X
possible observations of X
parameter of propability function
T (X ) = θ̂, finds θ̂ from X
6. Linear Estimation
P
θ=θ̂
Solve for θ to obtain ML estimator function θ̂ML
Check quality of estimator with MSE
Maximum-Likelihood Estimator is Asymptotically Efficient. However,
there might be not enough samples and the likelihood function is often
not known.
Homepage: www.latex4ei.de – Please report mistakes immediately.
tLS = (X ⊤ X )−1 X ⊤ y
f f
f
Conditional Mean Estimator: ´
TCM : x 7→ E[Θ| X = x] = Θ θ · fΘ| X (θ|x) dθ
fX |Θ (x)fθ (θ)
fX |θ (x)fθ (θ)
Posterior fΘ|X (θ|x) = ´ f
=
fX (x)
(x,ξ) dξ
Θ
X ,ξ
Hint: to calculate fΘ|X (θ|x): Replace every factor not containing θ,
such as f 1(x) with a factor γ and determine γ at the end such that
X
´
Θ fΘ|X (θ|x) dθ = 1
MMSE: E[Var[X |Θ = θ]]
⇒
θ∈Θ
5.7. Example:
Estimate mean θ of X with prior knowledge θ ∈ Θ ∼ N :
2
2
X ∼ N (θ, σX
|Θ=θ ) and Θ ∼ N (m, σΘ )
σ2
+N σ 2
Θ
X |Θ=θ
θ̂ML +
1 Px
For N independent observations xi : θ̂ML = N
i
Large N ⇒ ML better, small N ⇒ CM better
In the lecture (estimate
h): 
 h
i
 E ĥH h 2 
h
i
T MF = max
T  tr Var[T n] 
e
e
ĥMF = T MF y
T MF ∝ C h S H C −1
n
e
e
e e e
6.4. Example
η
sn
yn
h
K
M
= H sn + ηn
f
with H = (hm,k ) ∈ CM ×K
(m ∈ [1, M ], k ∈ [1, K])
LinearfChannel Model y = S h + n with
e
h ∼ N (0, C h ) and n ∼ N (0, C n )
e
e
System Model: y
n
Linear Estimator T estimates ĥ = T y ∈ CM K
e
H e
H
−1
T MMSE = C hy C −1
y = C h S (S C h S + C n )
e
e
e
e e ee e
e
−1 H −1
T ML = T Cor = (S H C −1
S
)
S
C
n
n
e
e
e e
e
e e
T MF ∝ C h S H C −1
n
e
e e eH
2
2
1K×M and C n = ση
1N ×M
For Assumption S S = N σs
e
e
e
e e
6.2. Linear Minimum Mean Square Estimator (LMMSE)
2
!
x
If Random joint variable z =
with
y
!
"
#
µ
C
c
x
xy then
x
and C z = e
µ =
z
cyx
cy
e
µy
LMMSE Estimation of y given x is
−1
−1
−1
ŷ = µy + cyx C x (x − µ ) = cyx C x x − µy + cyx C x µ
x
x
e
e }
| {z
|
{z e
}
minimizes the MSE for all estimators
θ̂CM = E[Θ|X = x] =
E (t⊤ hx)2
h
i
E (t⊤ v)2
ŷ = X tLS ∈ span(X)
LS
f
Orthogonality Principle: N observations xi ∈ Rd
Y − X T LS ⊥ span[X ] ⇔ Y − X T LS ∈ null[X ⊤ ], thus
e
e
e
fe
f
fe
X ⊤ (Y − X T LS ) = 0 and if N > d ∧ rang[X ] = d:
e
f
fe
f
T LS = (X ⊤ X )−1 X ⊤ Y
e
f f
f e
t,m
E[(TCM (X ) − Θ)h(X )] = 0
2
N σΘ
tMF = max
t
e
2 ŷLMMSE = arg min E y − (t⊤ x + m)
MMSE Estimator: θ̂MMSE = arg min MSE
σ2
X |Θ=θ
σ2
+N σ 2
Θ
X |Θ=θ
∥hx∥
Find Filter t⊤ that maximizes SNR = ∥v∥
i)
( h
Estimate y with linear estimator t, such that ŷ = t⊤ x + m
Note: the Model does not need to be linear! The estimator is linear!
2
2
⇒ σX
= σX
|Θ=θ + σΘ
T CM : x 7→ E[Θ| X = x] = µ + C Θ,X C −1
(x
−µ )
Θ
X
e
eX
MMSE:
h
i
2
C X ,Θ )
E ∥T CM −Θ∥2 = tr(C θ| X ) = tr(C Θ − C Θ,X C −1
X
e
e
e
e
e
Multivariate Gaussian: X , Θ ∼ N
Orthogonality Principle:
T CM (X ) − Θ ⊥ h(X )
For channel y = hx + v, Filtered: t⊤ y = t⊤ hx + t⊤ v
m
=t⊤
2 Minimum MSE: E y − (x⊤ t + m)
2
=m
6.5. Estimators
2 Px
Upper Bound: Uniform in [0; θ] : θ̂ML = N
i
x+1
x
Probability p for B(p, N ): p̂ML = N
p̂CM = N
+2
N
1 P x
Mean µ for N (µ, σ 2 ) : µ̂2
i
ML = N
i=1
2
1
Variance σ 2 for N (µ, σ 2 ) : σ̂ML
= N
N
P
(xi − µ)2
i=1
−1
cxy
= cy − cyx Cx
Hint: First calculate ŷ in general and then set variables according
to system equation.
−1
Multivariate: ŷ = T ⊤
T⊤
LMMSE x
LMMSE = C yx C x
e
e
e e
If µ
z
= 0 then
Estimator ŷ = cy,x C −1
x x
e
Minimum MSE: E[cy,x ] = cy − t⊤ cx,y
from LaTeX4EI – Mail: [email protected]
Last revised: August 4, 2016
2/3
7. Gaussian Stuff
8.3. Hidden Markov Chains
9.4. Particle-Filter
7.1. Gaussian Channel
Problem: states X i are not visible and can only be guessed indirectly as
a random variable Y i .
For non linear state space and non-gaussian noise
Non-linear State space:
xn = gn (xn−1 , v n )
y = hn (xn−1 , wn )
Channel: Y = hsi + N with h ∼ N , N ∼ N
n
Q
L(y1 , ..., yN ) =
fY (yi , h)
i
i=1
1
fY (yi , h) = √
exp − 12 (yi − hsi )2
i
2πσ
2σ
2
s⊤ y
ĥM L = argmin{y − hs } = ⊤
s
h
s
If multidimensional channel: y = S h + n:
e
1
L(y, h) = q
exp − 1
(y − S h)⊤ C −1 (y − S h)
2
det(2πC )
e
e
e
e
l(y, h) = 1
log(det(2πC ) − (y − S h)⊤ C −1 (y − S h)
2
e
e
e
e
d
(y
dh
− S h)⊤ C −1 (y − S h) = −2S ⊤ C −1 (x − S h)
e
e
e
e e
e
Gaussian Covariance: if Y ∼ N (0, σ 2 ), N ∼ N (0, σ 2 ):
C Y = Cov[Y , Y ] = E[(Y −µ)(Y −µ)⊤ ] = E[Y Y ⊤ ]
e
For Channel Y = Sh + N: E[Y Y ⊤ ] = S E[hh⊤ ]S ⊤ + E[NN⊤ ]
fX 2|X 1
fX 1
fX 3|X 2
fX n − 1|X n − 2 fX n |X n − 1
fX n +1|X n
11. Tests
11.1. Neyman-Pearson-Test
n
X1
St at es
......
X2
L ikel ihoods
fY 1|X 1
fY 2|X 2
Obser vat ions
Y1
Y2
Conditional pdf fX |Y
n n
State-transision pdf fX n | X
X n−1
Xn
fY n − 1|X n − 1
fY n |X n
Y n−1
Yn
......
......
z
}|
{
Posterior Conditional PDF: fX n |Y n (xn |yn ) ∝ fY n |X n (yn |xn ) ·
ˆ
· fX n |X
(xn |xn−1 ) fX
(xn−1 |yn−1 ) dxn−1
n−1
n−1 |Y n−1
{z
}|
{z
}
X |
1
Monte-Carlo-Integration: I = E[g(X )] ≈ IN = N
n−1
ˆ
fX |X
· fX
dxn−1
n n−1
n−1 |Yn−1
If
fX n (x) dx ̸= 1 then IN =
N
P
i
Gd(θ1)
N
P
g̃(xi )
i=1
= q
1
det(2πC x )
exp
−1
2
⊤
x−µ
C −1
x−µ
x
x
x
e
e
Affine transformations y = Ax + b are jointly Gaussian with
e
y ∼ N (Aµ + b, AC x A⊤ )
e x
e e e
All marginal PDFs are Gaussian as well
Contour Lines
Ellipsoid with central point E[y] and main axis are the eigenvectors of
C −1
y
e
7.3. Conditional Gaussian
A ∼ N (µ , C A ), B ∼ N (µ , C B )
A e
B e
⇒ (A|B = b) ∼ N (µ
,C
)
A|B e A|B
Conditional Mean:
E[A|B = b] = µ
A|B=b
recursively calculates the most likely state from previous state estimates
and current observation. Shows optimum performance for Gauss-Markov
Sequences.
State space:
xn = Gn xn−1 + B un + v n
e nx + w e
y =H
n
n
n
f
With gaussian process/measurement noise v n /wn
E[xn |y ] = x̂n|n
Short notation: E[xn |y
] = x̂n|n−1
n−1
n
n−1
] = ŷ
n|n−1
n
E[y |y ] = ŷ
n
n
n|n
w̃ g(x )
Mean: x̂n|n = x̂n|n−1 + K n y − H n x̂n|n−1
n
f
f
Covariance: C x
= Cx
+ KnHnCx
e n|n
e n|n−1
f f e n|n−1
correction: E[X n |∆ Y n =yn ]
z
}|
7.4. Misc
With optimal Kalman-gain (prediction for xn based on ∆yn ):
Sequence of a random variable. Example: result of a dice is RV, roll a
dice several times is a random sequence.
8.2. Markov Sequence Xn : Ω → Xn
Sequence of memoryless state transitions with certain probabilities.
1. state: fX 1 (x1 )
2. state: fX | X (x2 |x1 )
2
1
n. state: fX n | X
(xn |xn−1 )
n−1
x̂n|n−1
| {z }
estimation E[X n | Y n−1 =yn−1 ]
{
+ K n y − H n x̂n|n−1
f | n
f{z
}
x̂n|n =
8.1. Random Sequences
xα
1 R(x) > 1
0 otherwise
ROC Graphs: plot Gd (θ1 ) as a function of Gd (θ0 )
innovation:∆yn
⊤
−1
Kn = Cx
H⊤
n (H n C xn|n−1 H n + C wn )
e n|n−1 f
e
f
|f e
{zf
}
C δy
e n
Innovation: closeness of the estimated mean value to the real value
∆y = y − ŷ
= y − H n x̂n|n−1
n
n
n|n−1
n
f
2
Init: x̂0|−1 = E[X 0 ]
σ0|−1
= Var[X 0 ]
´
MMSE Estimator: x̂ = xn fX n | Y
(xn |y
) dxn
(n)
P(A ∩ B|E) = P(A|E) · P(B|E)
dML (x) =
11.2. Bayes Test (MAP Detector)
Given Y , X and Z are independent if
fZ | Y ,X (z|y, x) = fZ | Y (z|y) or
fX ,Z | Y (x, z|y) = fZ | Y (z|y) · fX | Y (x|y)
fZ | X ,Y (z|x, y) = fZ | Y (z|y) or fX | Z ,Y (x|z, y) = fX | Y (x|y)
10. Hypothesis Testing
making a decision based on the observations
Null hypothesis H0 : θ ∈ Θ0 (Assumed first to be true)
Alternate hypothesis H1 : θ ∈ Θ1 (The one to proof)
Descision rule φ : X → [0, 1] with
φ(x) = 1: decide for H1 , φ(x) = 0: decide for H0 Error level α with
E[d(X )|θ] ≤ α, ∀θ ∈ Θ0
2. step: Update
Conditional Variance:
C A|B = C AA + C AB C −1
C BA
e
e
e
e BB e
8. Sequences
x
R(x) = c
Maximum Likelihood Detector:
i=1
10.1. Definition
1. step: Prediction
Mean: x̂n|n−1 = Gn x̂n−1|n−1
e
Covariance: C x
= Gn C x
G⊤
n + Cv
e n|n−1
e e n−1|n−1 e
e
b−µ
= µ + C AB C −1
A
B
e
e BB
If CDF of gaussian distribution given Φ(z) ∼ N (0, 1) then for X ∼
N (1, 1) the CDF is given as Φ(x − µx )
f X (x; θ1)
α
9.5. Conditional Stochastical Independence
9.1. Kalman-Filter
E[y |y
f X (x; θ0)
(
i
7.2. Multivariate Gaussian Distributions
fx (x) = fx1 ,...,xn (x1 , ..., xn ) =
0
X 0 : null hypothesis X 1 : alternative
qX (x )
i=1
X
Errorlevel α
Importance Sampling: Instead of fX (x) use Importance Density qX (x)
i
N
1 P w̃ i g(xi ) with weights w̃ i = fX (x )
IN = N
i
´
Likelihood-Ratio:
f (x;θ )
R(x) = fX (x;θ1 )
Steps: For α calculate xα , then c = R(xα )
last conditional PDF
9. Recursive Estimation
A vector x of n independent Gaussian random variables xi is jointly Gaussian. If x ∼ N (µ , C x ):
x e
γ =
P0 against P1 is
R(x) > c
R(x) = c
R(x) < c
α−P0 ({R>c})
P0 ({R=c})
i
N random Particles with particle weight wn
at time n
Likelihood pdf fY n | X n
X
state transition
......
Estimation:
fX |Y ∝ fY |X ·
n n
n n
likelihood
The best test
 of
1

dNP (x) = γ


0
(n)
For non linear problems: Suboptimum nonlinear Filters: Extended KF,
Unscented KF, ParticleFilter
9.2. Extended Kalman (EKF)
Linear approximation of non-linear g, h
xn = gn (xn−1 , v n )
vn ∼ N
y = hn (xn−1 , wn )
wn ∼ N
Error
Type
Reality
Decision⧹
1 (FA)
False
Alarm
H1 rejected
True Negative
(H0 accepted)
P=1−α
False
Negative
(Type 2)
P=β
False Positive (Type 1)
P=α
True Positive
P=1−β
H1 false (H0 true)
H1 true (H0 false)
Prior knowledge on possible hypotheses: P({θ ∈ Θ0 }) + P({θ ∈
Θ1 }) = 
1, minimizes the probability of a wrong decision.
(
1 fX (x|θ1 ) > c0 P(θ0 |x)
fX (x|θ0 )
c1 P(θ1 |x) = 1 P(θ1 |x) > P(θ0 |x)
dBayes =
0 otherwise
0 otherwise
Risk weights c0 , c1 are 1 by default.
If P(θ0 ) = P(θ1 ), the Bayes test(is equivalent to the ML test
c0 type 1 d(x) = 1, but θ = θ0
Loss Function L(d(x), θ) =
c1 type 2 d(x) = 0, but θ = θ1
risk(d) = E[L(d(X ), θ)] = E[E[L(d(x),
θ)|x = X ]]


0 x ∈ X0
Multiple Hypothesis dBayes = 1 x ∈ X1


2 x ∈ X2
11.3. Linear Alternative Tests
(
d : X → R, x 7→
2 (DE)
H1 accepted
Detection (H0 rejected)
Error
1
0
w ⊤ x − w0 > 0
otherwise
Estimate normal vector w⊤ , which separates X into X0 and X1
ln(det(C 0 ))
⊤ −1
1
log R(x) = ln(det(C
e )) + 2 (x − µ0 ) C 0 (x − µ0 )−
1
e
⊤ −1e
1
− 2 (x − µ ) C 1 (x − µ ) = 0
1
1
e
For 2 Gaussians, with C 0 = C 1 = C : w⊤ = (µ − µ )⊤ C
1
0
e
e
e
e
(µ −µ )⊤ C (µ −µ )
1
0
1
0
and constant translation w0 =
e
2
TP
=1−β
Power: Sensitivity/Recall/Hit Rate: TP+FN
TN
Specificity/True negative rate: FP+TN
=1−α
TP
Precision/Positive Prediciton rate: TP+FP
2−α−β
TP+TN
Accuracy: P+N =
2
10.1.1 Design of a test
Cost criterion Gφ : Θ → [0, 1], θ 7→ E[d(X)|θ]
False Positive lower than α: Gd (θ)|θ∈Θ0 ≤ α, ∀θ ∈ Θ0
False Negative small as possible: max{Gd (θ)|θ∈Θ1 }, ∀θ ∈ Θ1
10.2. Sufficient Statistics
Sufficiency for a test T (X ) means that no other test statistic, i.e., function
of the observations x, contains additional information about the parameter θ to be estimated:
fX |T (x|T (x) = t, θ) = fX |T (x|T (x) = t)
n
9.3. Unscented Kalman (UKF)
Approximation of desired PDF fX n |Y n (xn |yn ) by Gaussian PDF.
Homepage: www.latex4ei.de – Please report mistakes immediately.
from LaTeX4EI – Mail: [email protected]
Last revised: August 4, 2016
3/3
Herunterladen