dc.description.abstract |
As the depth of a neural network increases, the non-linearity and more parameters allow it to learn more complex functions. While network deepening has been proven e_ective, there is still an opportunity for e_cient feature extraction within a layer that will improve the overall performance for the complexity of the network. Widening networks by adding more _lters to each layer is the naive approach towards strengthening layer- wise feature extraction. It is an ine_cient scaling option, considering the number of parameters being quadratic with the number of _lters employed per layer. In contrast, parallel extractors in each layer provide an e_cient scaling option. However, without context-dependent input allocation among these processes, such parallel computations tend to learn similar features, collapsing to a single computation. Thus, it is vital to study the parallel stacking of computations layer-wise and design a routing method that allocates incoming feature maps to these computations. The expected outcome is to group homogeneous feature maps in parallel layers and employ exclusive _lter sets to each of the groups (paths) so that the _lter sets of each path can specialize in extracting features exclusive to each group. To allow the network input to be routed end-to-end over such parallel paths, we propose data-dependent parallel resource allocation methods layer-wise. Given a layer of parallel tensors, we _rst employ sub-networks that produce gating coe_cients to weigh cross- connections to the next layer of parallel tensors. Then, the next layer's parallel tensors are constructed by getting summations of the current layer's tensors, each weighted by the corresponding gating coe_cient. We demonstrate that our multi-path networks outperform previous widening and adaptive feature extraction, ensembles, and deeper networks with comparable complexity using image recognition challenges. To further regularize gating sub-networks, we think of a gating network's path allocation as a soft clustering of its input feature maps. Thus, we propose a neural mixture model-based clustering objective to use as a regularization loss for the gating networks, which We _rst study as a standalone neural network-based clustering approach. The proposed clustering framework uses a neural network to learn cluster distributions in mixture modeling instead of tuning human-de_ned distributions. We adopt the Expectation-Maximization (EM) algorithm to train the network and perform batch- wise EM iterations where the forward pass acts as the E-step and the backward pass as the M-step. For image clustering, we use the mixture-based EM objective as the clustering objective, along with consistency optimization. Our networks outperform traditional and single-stage deep clustering methods that still depend on k-means. Finally, we propose using this clustering objective to regulate gating networks to get distributed gating activation patterns. We show that the skewed gating patterns can be improved with such regularization loss as a local regularization. We further present the need for a global regularization method that takes the end task performance into account. We also suggest extending research towards sparse resource allocation, along with gating networks to handle more diversity. |
en_US |