## 深度学习不能做因果推理吗

Learnable inductive biases in Neural Netrok Mark van der Wilk
Growing research group focusing on Gaussian process inference backgred by theory to makereliable decision making systems automatic learning of inductive vias in neural works When should neurons be connected?
Hyperparameter selction and architecture design
Every time we train a NN we need to decide on hyperparameters How many layers?hOW MANY UNITS IN A layer?What layer structure?Convolutional? Skip connections?Data argumentation parameters? As architectures get more complex Multi taskWhich layers to share?What kind of task specific layers?How much capacilty to assign to each task
Main tool is trail and errorcross validation Invariances Every prediction problem needs an inductive biasfx, imagey, labelinductive bias for the unseen bias
Architecture determins inductive vias through equivariances Convolutions are a common solutiontranslational variances
Can we automatically adjust invariance properties in layers?
Summary Goal Given a datasetadapt the inductive bias to it key requirements find a parameterisation for different inductive biasesFind a learning objective that works for inductive viases We want to optimise it through backprop So it is easy We will look at Invariances and equivariances parameterised though Transformations on the input data argumentation transformations on the filter convolutions how bayes helps with finding a learning objectiveSingle layer and deep models Invariance, data argumentation and training loss Data argumentation

Take a dataset

y = { ( x n , y n ) } n = 1 N y=\{(x_n, y_n)\}_{n=1}^N

Create larger dataset

y ′ = { } y'=\{\}

Making models invariant f ( x ) = f ( t i ( x ) ) f(x)=f(t_i (x))

w ∗ w^*

f ( x ) = f ( t i ( x ) ) f(x)=f(t_i(x))

p ( t ∣ θ ) p(t|\theta)

θ \theta

amount of different transformations to apply

what if we do not know the transformations for which

f ( x ) = f ( t i ( x ) ) f(x)=f(t_i(x))

θ \theta

w ∗ , θ ∗ = m i n w , p w^*,\theta^*=\mathop{min}\limits_{w,p}

p ( t ∣ θ ) p(t|\theta)

Bayesian model selection

f w , θ = ϕ θ ( x ) T W = ∑ i = 1 K ϕ θ ( i ) ( x ) w i f_{w,\theta}=\phi_\theta(x)^T W = \sum_{i=1}^K \phi_\theta^{(i)}(x)w_i

L t r a i n L_{train}

p ( f , θ ∣ y ) = p ( y ∣ f ) p ( f ∣ θ ) p ( θ ) p ( y ) = p ( y ∣ f ) p ( f ∣ θ ) p ( y ∣ θ ) p ( y ∣ θ ) p ( θ ) p ( y ) p(f,\theta|y)=\frac{p(y|f)p(f|\theta)p(\theta)}{p(y)}=\frac{p(y|f)p(f|\theta)}{p(y|\theta)}\frac{p(y|\theta) p(\theta)}{p(y)}

p ( y ∣ θ ) = ∫ p(y|\theta)=\int

N N NN

E L B O ELBO

Learning data augumentation Formulate data learning data argumentations as Bayesian hyperparameter learning

f w , θ ( x ) = E p ( t ∣ θ ) [ ϕ θ ( t ( x ) ) T w ] f_{w,\theta}(x)=\mathbb{E}_{p(t|\theta)[\phi_\theta (t(x))^T w]}

p ( y n ∣ w , θ ) = N p(y_n|w,\theta)=\mathcal{N}

l o g p ( y ∣ θ ) ≥ L = ∑ n E q ( w ) [ l o g p ( y n ∣ E p ( t ∣ θ ) ) ] log p (y | \theta) \ge \mathcal{L}=\sum_{n} \mathbb{E}_{q(w)}[log p (y_n|\mathbb{E}_{p(t|\theta)})]

l o g p ( y ∣ θ ) ≥ log p (y|\theta) \ge

L ( q ( w ) , θ ) \mathcal{L}(q(w),\theta)

p ( t ∣ θ ) p(t|\theta)



one modeltwo in two problems

L ( q ( w ) , θ ) \mathcal{L}(q(\bold{w}),\theta)



Learning equivariance previouslyadded invariance by transforming the input image

f w , θ ( x ) = E p ( t ∣ θ ) [ ϕ ] f_{\bold{w}, \theta} (\bold{x})=\mathbb{E}_{p(t|\theta)}[\phi]



Single architecture for different domains Learning invariances using the Marginal Likelihood Learning invariance by backpropBut Gaussian processes only Data argumentation in BNNs and the cold posterior effect Whether a principled approach to DA influences the gold Summary Given a datasetadapt the inductive bias to it Key requirements parameterisationlearning objective Backprop invariances Outlook better than trial and error design NNsBayesian methods are helping the automation of selecting invariances making ity as easy as backprop Can nake NNs More accurateeasier to usemore energy efficient Get to the smarter neuron! Meta learningMore Bayes?Causality?