# 1. PCA缺點

PCA是一種能夠極大提升無監督特征學習速度的數據降維算法

# 2. 線性判別式分析

Fisher’s Linear Discriminant(FLD)和Linear Discriminant Analysis(LDA)是統計學上的經典分析方法,兩種不同之處在於LDA比FLD多了一些關於變量分布和協方差的假設.為統一,不再區分FLD和LDA,統稱LDA. LDA既可以用來線性分類,也可以用來單純的對數據進行降維.目前在醫學的患者疾病分級,經濟學的市場定位,產品管理,市場研究,人臉識別以及機器學習等領域有廣泛應用.相較於FLD,LDA假設:
1. 樣本數據服從正態分布
2. 各類的協方差相等

## 2.1 二分類

$\begin{array}{ll}x& =\left\{{x}^{\left(1\right)},{x}^{\left(2\right)},\cdots ,{x}^{\left(m\right)}\right\}\\ z& \in \left\{0,1\right\}\end{array}$

${y}^{\left(i\right)}={w}^{T}{x}^{\left(i\right)}\phantom{\rule{1em}{0ex}}i\in \left[1,m\right]$

${\mu }_{i}=\frac{1}{{N}_{i}}\underset{x\in {z}_{i}}{\sum }x$

$\begin{array}{ll}{\stackrel{^}{\mu }}_{i}& =\frac{1}{{N}_{i}}\underset{x\in {z}_{i}}{\sum }{w}^{T}\cdot x\\ \\ & ={w}^{T}\cdot {u}_{i}\end{array}$

$\begin{array}{ll}J\left(w\right)& =|{\stackrel{^}{\mu }}_{1}-{\stackrel{^}{\mu }}_{2}|\\ \\ & =|{w}^{T}\cdot \left({\mu }_{1}-{\mu }_{2}\right)|\phantom{\rule{2em}{0ex}}\left(2.1\right)\end{array}$

$\begin{array}{ll}{\stackrel{^}{S}}_{i}& =\underset{x\in {z}_{i}}{\sum }\left({w}^{T}x-{\stackrel{^}{u}}_{i}{\right)}^{2}\\ \\ & =\underset{x\in {z}_{i}}{\sum }\left({w}^{T}x-{w}^{T}{\mu }_{i}{\right)}^{2}\\ \\ & =\underset{x\in {z}_{i}}{\sum }\left({w}^{T}x-{w}^{T}{\mu }_{i}\right)\cdot \left({w}^{T}x-{w}^{T}{\mu }_{i}{\right)}^{T}\\ \\ & =\underset{x\in {z}_{i}}{\sum }{w}^{T}\left(x-{\mu }_{i}\right)\cdot \left(x-{\mu }_{i}{\right)}^{T}w\end{array}$

$\begin{array}{ll}\left({w}^{T}x-{w}^{T}{\mu }_{i}{\right)}^{2}& =\left({w}^{T}x-{w}^{T}{\mu }_{i}\right)\cdot \left({w}^{T}x-{w}^{T}{\mu }_{i}{\right)}^{T}\\ \\ & =\left({w}^{T}x-{w}^{T}{\mu }_{i}{\right)}^{T}\cdot \left({w}^{T}x-{w}^{T}{\mu }_{i}\right)\end{array}$

${S}_{i}=\underset{x\in {z}_{i}}{\sum }\left(x-{\mu }_{i}\right)\cdot \left(x-{\mu }_{i}{\right)}^{T}$

${\stackrel{^}{S}}_{i}={w}^{T}{S}_{i}w$

$J\left(w\right)=\frac{|{\stackrel{^}{\mu }}_{1}-{\stackrel{^}{\mu }}_{2}{|}^{2}}{{\stackrel{^}{S}}_{1}+{\stackrel{^}{S}}_{2}}$

$\begin{array}{ll}|{\stackrel{^}{\mu }}_{1}-{\stackrel{^}{\mu }}_{2}{|}^{2}& ={w}^{T}\left({\mu }_{1}-{\mu }_{2}\right)\left({\mu }_{1}-{\mu }_{2}{\right)}^{T}w\\ {\stackrel{^}{S}}_{1}+{\stackrel{^}{S}}_{2}& ={w}^{T}\left({S}_{1}+{S}_{2}\right)w\\ hypothesis\\ {S}_{B}& =\left({\mu }_{1}-{\mu }_{2}\right)\left({\mu }_{1}-{\mu }_{2}{\right)}^{T}\\ {S}_{W}& ={S}_{1}+{S}_{2}\end{array}$

$J\left(w\right)=\frac{{w}^{T}{S}_{B}w}{{w}^{T}{S}_{W}w}$

${w}^{T}{S}_{W}w=1$

$\begin{array}{ll}f\left(\lambda ,w\right)& ={w}^{T}{S}_{B}w+\lambda \left(1-{w}^{T}{S}_{W}w\right)\\ {\mathrm{\nabla }}_{w}f\left(\lambda ,w\right)& =2{S}_{B}w-2\lambda {S}_{W}w=0\\ =>& {S}_{B}w=\lambda {S}_{W}w\end{array}$

$\left({S}_{W}^{-1}{S}_{B}\right)\cdot w=\lambda w$

$\begin{array}{ll}{S}_{B}w& =\left({\mu }_{1}-{\mu }_{2}\right)\left({\mu }_{1}-{\mu }_{2}{\right)}^{T}w\\ \\ {\lambda }_{w}& =\left({\mu }_{1}-{\mu }_{2}{\right)}^{T}w\end{array}$

${S}_{W}^{-1}{S}_{B}w={S}_{W}^{-1}\left({\mu }_{1}-{\mu }_{2}\right){\lambda }_{w}=\lambda w$
## 標題 ##

$w={S}_{W}^{-1}\left({\mu }_{1}-{\mu }_{2}\right)$

1. 這里假設 ${S}_{W}$ $S_W$可逆，但是當樣本維數較高，而樣本數較少時，這時可能不可逆，為奇異矩陣。此時可以考慮先使用PCA對樣本進行降維，然后再對降維后的數據使用LDA。
2. 求w時，並沒有求 ${S}_{W}^{-1}{S}_{B}$ $S_W^{-1}S_B$的特征向量，因為不一定是對稱矩陣，求它的特征向量時不能使用奇異值分解，使用普通方式求解特征值的方式，時間復雜度為 $O\left({n}^{3}\right)$ $O(n^3)$

//todo