A Novel Learnable Orthogonal Wavelet Unit Neural Network with Perfection Reconstruction Constraint Relaxation for Image Classification

CNNs utilize lowpass frequency features, evidenced by max pooling operations that preserve dominant features. Addressing this, we used a one-layer FCN using both coarse and detailed components via DWT. We then developed UwU, a learnable wavelet-based unit that integrates the PR constraint relaxation, allowing feature map component fine-tuning. Distinctively, UwU’s coefficients are trainable, unlike prior studies. This is the first work that utilizes PR constraint relaxation to enhance CNNs. Our innovative techniques serve to enhance stride-convolution, pooling, and downsampling units in CNNs. We tested these improved units using the ResNet family architectures against traditional frequency and wavelet-based units. Performance metrics from CIFAR10, ImageNet1K, and the DTD show promising results. Particularly, while CIFAR10 results are on par with other methods, there’s a marked performance improvement on ImageNet1K and DTD. Further, integrating UwU into the ResNet18-based encoder of the CFLOW-AD system yields competitive anomaly detection results, particularly for hazelnut images in the MVTecAD dataset.

As shown in Fig. 1, the CIFAR10 [11] sample has most information concentrated in the low-frequency region or the approximation component.In contrast, all other samples from ImageNet1K [12], MVTecAD [13], [14], and DTD [15] have information in both low (approximation) and high (detail) frequency regions.Especially in the "cracked" DTD sample shown in the fourth column of Fig. 1  contains only few features of the "cracked" texture, which is dominantly shown in detail or highpass components such as X hl , X lh , and X hh .For enhanced CNN classification, we employ all DWT multi-resolution components and integrate a one-layer FCN to find optimal feature maps.We introduce UwU, a novel learnable orthogonal wavelet unit, notable for its PR constraint relaxation and trainable coefficients-unlike conventional static ones.These innovations have been embedded into ResNet18, enhancing downsampling, pooling, and stride-convolution processes in CNNs.The networks were validated with the baseline ResNet18 along with its waveletbased variants like WaveCNet [9], Wavelet-Attention CNN [16], CWNN [10] and spectrum-based variants such as Spec-tralPooling [7] and DiffStride [8] on CIFAR10 [11], Ima-geNet1K [12], and DTD [15] datasets.We further applied the proposed units in the encoder of the CFLOW-AD [17] pipeline to the anomaly detection task for the hazelnut category in the MVTecAD [13], [14] dataset.In summary: for processing all subband components of DWT.
• Inspired by the perfect reconstruction constraint for the filter bank, which consists of distortion and aliasing, we introduce a new distortion loss function to design a new set of filter in DWT, while maintaining aliasing cancellation.
• We apply the proposed methods on a wide range of image classification datasets: CIFAR10, ImageNet1K, and DTD and achieve excellent performance.• The proposed units are also implemented in the CFLOW-AD anomaly detection model and tested on the MVTecAD dataset.

II. RELATED WORKS
Max pooling in CNNs down-samples feature maps, retaining maximal values to preserve prominent features [18], [19].However, without filtering, it can lead to aliasing artifacts, violating the sampling theorem [6].This results in frequency overlaps in signal processing and Moiré patterns in imaging.Furthermore, max pooling can compromise object structures in deep networks [9].
Spectral pooling [7] uses the DFT to pool in the frequency domain, emphasizing lower frequencies and omitting higher ones.However, its non-differentiability with respect to strides [8] requires predefined hyper-parameters for each downsampling layer.While DiffStride [8] optimizes stride numbers and cropping sizes via back-propagation, detail information in the feature map still gets discarded.
DWT-based methods use wavelet bases in CNNs.Through DWT or FWT [20], [21], these methods enable CNNs to operate in the wavelet domain, facilitating PR and avoiding aliasing.Such techniques are evident in recent image classification works [9], [10], [16].In [9], [22], only approximation components of wavelets are used.In [22], a secondlevel wavelet decomposition is used, but only its second-level sub-bands are used for reconstruction.In [9], the first-level approximation is employed for feature maps.In [16] attention maps are derived from vertical and horizontal details, and in [10], dual-tree complex wavelet transformation (DT-CWT) is used, and its decomposed components are averaged as the down-sampling output.Distinctively, our UwU offers trainable coefficients and integrates PR constraint relaxation, diverging from prior fixed-coefficient approaches.

III. PROPOSED METHOD
We introduce a novel learnable universal wavelet unit (UwU) with an orthogonal filter bank.Paired with a onelayer FCN, it optimizes feature maps and can serve as downsampling and pooling units.Additionally, replacing a strideconvolution layer with a non-stride version, followed by UwU, retains detail components in the convolution output.

A. Universal Wavelet Unit with PR Constraints
DWT analysis and synthesis depend on filter banks meeting the PR constraint.This ensures that signals decomposed by analysis filters are perfectly reconstructed by synthesis ones.The PR consists of both Aliasing Cancellation and No Distortion/Half-Band conditions [20], [21].In recent works [9], [16], predefined orthogonal wavelets such Haar, Daubechies, and Symlet [20], [21], [23] are used.In this work, we propose a wavelet unit with trainable coefficients which would still abide to Aliasing Cancellation and Halfband conditions.The filter bank structure is shown in Fig. 2.
In Fig. 2, the analysis and synthesis parts of the filter bank are shown in the blue and red rectangle boxes, respectively.H 0 and H 1 are, correspondingly, low-pass and high-pass filters for the analysis part of the filter bank; whereas F 0 and F 1 are, respectively, low-pass and high-pass filters for the synthesis part of the filter bank.To satisfy the Alias Cancellation condition of an orthogonal filter bank, with h 0 = [h(0), h(1), ..., h(N − 1)] as coefficients of H 0 with N taps, the coefficients of the other filters in the filter bank can be found with alternating flip, order flip and alternating signs relations, which can be expressed as follows [21]: In (1), f 0 , h 1 , and f 1 are filter coefficients of F 0 , H 1 , and F 1 , respectively.From the relations presented in (1), the filter bank satisfies the anti-aliasing condition of the PR constraint.Moreover, with Aliasing Cancellation condition, only filter coefficients of H 0 need to be found, which reduces the number of parameters needed for the analysis part used in a classification model.In addition, to fullfill the PR constraint, the filter coefficients need to satisfy the Half-band condition as follows [21]: where P(z) = H 0 (z)H 0 (z −1 ).From (2), the condition for the filter coefficients can be expressed as follows: (3) From (3), the loss function for the PR constraint based on Halfband condition can be mathematically expressed as follows: From ( 1) and ( 4), PR constraint is implemented to train the filter bank analysis.The relaxation of the PR constraint can be done by multiplying L PR with a factor α. In our image classification study using Cross-Entropy L CE , an α of 0.01 gave the best results for all image experiments.A higher α strengthens the No Distortion constraint, but relaxing it allows more coefficient fine-tuning.The total loss function L is expressed as follows:

B. 2D Implementation
From the filter coefficients satisfying the PR constraint stated in the previous section, the high-pass and low-pass filter matrices H and L are computed to find the approximation X ll , and other detail components X lh , X hl , and X hh .L can be computed as follows: where D is the downsampling matrix and H is a Toeplitz matrix with filter coefficients of H 0 (z).H has a similar form as L with filter coefficients of H 0 (z −1 ).Using H and L, X ll , X lh , X hl , and X hh are computed as follows:

C. Implementation in CNN architectures
We implemented the proposed units in ResNet family architectures.For downsampling and pooling layer, the proposed UwU is followed by a onelayer-FCN.In addition, the 2 strideconvolution is replaced with a non-stride convolution block followed by the proposed UwU.The implementation is shown in Fig. 3.
1) On CIFAR10: In general, the proposed methods show a comparable improvement to the baseline ResNet18 compared to WaveCNet or Original Wavelet (with one-layer FCN).The performance is shown in Table I.The best results from Table I are compared with the reported performances of Wavelet-Attention CNN ResNet18 [16] (WA-CNN-ResNet18), SpectralPool-ResNet18 and DiffStride-ResNet18 from [8].The best performances on CIFAR10 are shown in Table II.The best UwU on CIFAR10 has a higher accuracy than the baseline and a competitive performance compared with the original DB2.

B. ResNet34 and ResNet50 on DTD and ImageNet1K
For DTD, we tested the proposed unit on ResNet34 and ResNet50, comparing with WaveCNet for 2 and 4 taps, shown in Table VI.On ImageNet1K, ResNet34 with our unit using Symlet2 for initialization outperformed WaveCNet with Cohen4.4 and baseline models, with the accuracy of 74.65%, 74.61% and 73.30%, respectively.UwU consistently tops performance on DTD and ImageNet1K, with a more significant accuracy gain on DTD.

C. As the Encoder of CFLOW-AD on MVTecAD (Hazelnut)
The ResNet18 with the proposed unit is used as the encoder in the CFLOW-AD [17] pipeline for the anomaly detection task on hazelnut images of MVTecAD [13], [14] and shows a

V. CONCLUSION
We present UwU, a learnable orthogonal wavelet unit paired with a one-layer FCN for optimal CNN feature maps.A defining feature of UwU is its learnable coefficients, setting it distinctly apart from earlier works that relied on static, untrainable coefficients.Its design meets the Aliasing Cancellation condition, halving trainable parameters, and with our new loss function, adheres to the PR constraint flexibly.We have integrated these techniques into the ResNet family architectures, yielding competitive results on CIFAR10, ImageNet1K, and DTD.Applied to the ResNet18 encoder in the CFLOW-AD pipeline, the proposed methods show promise in hazelnut anomaly detection.In future work, we will apply the technique to object detection and image segmentation.
, the decomposed X ll component, showing lowpass or approximation information, This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the Innovative Human Resource Development for Local Intellectualization support program (ITP-2023-2020-0-01741) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation).

Fig. 1 .
Fig. 1.From left to right, wavelet (Haar) and frequency representations of the samples from CIFAR10 (first column), ImageNet1K (second column), MVTecAD (third column), and DTD (fourth column).The original images (top row) are shown with its frequency representation (middle row) and wavelet representation (bottom row).X ll , X lh , X hl , and X hh show the coarse approximation and details wavelet representations.

Fig. 4 .
Fig. 4. Anomaly detection for hazelnut objects in the MVTecAD.The first row shows pairs of the ground-truth mask (left) and the corresponding input (right).From second to fourth rows, pairs of the attention map (left) and the segmentation (right) overlaid on the corresponding input.The attention map using the proposed approach (last row) shows excellent localization property in the anomaly area.

TABLE I ACCURACY
OF UWU ON RESNET18 FOR CIFAR10.

TABLE III ACCURACY
OF UWU ON RESNET18 FOR IMAGENET1K.

TABLE IV ACCURACY
OF UWU AND OTHER APPROACHES WITH RESNET18 ARCHITECTURE ON IMAGENET1K.

TABLE V ACCURACY
OF UWU ON RESNET18 FOR DTD

TABLE VI ACCURACY
OF UWU ON RESNET34 AND RESNET50 FOR DTD

TABLE VII LOCALIZATION
AND DETECTION AUROCS OF CFLOW-AD PIPELINE WITH THE UWU, WAVECNET, AND BASELINE RESNET18 ENCODERS FOR HAZELNUT CATEGORY IN MVTECAD performance to the baseline ResNet18 along with the WaveCNet ResNet18 encoders.The models were evaluated with localization and detection AUROCs, shown in Table VII.Defect detection result examples are visualized in Fig. 4. comparable