2025-05-05 FaradAI Military Super-Resolution Task T6.2

2025-05-05 FaradAI Military Super-Resolution Task T6.2AprakstsProvizoriskie rezultātiUzdevumiPotenciālie rezultāti

Apraksts

Izmantojot datu kopu super_resolution_dataset (uzmanīgi, nejauši neaiztikt citus failus), nomērīt rezultātu, palielinot izšķirtspēju no 320p, 720p uz 1080p tieši militāriem transportlīdzekļiem ar jau publiski pieejamiem modeļiem. Izveidot jaunu modeli un salīdzināt. Plānots notestēt arī attēlu kvalitātes uzlabošanas modeļus kā debulurring utt.

Path: /storage/telegram_war_videos_2024/high_res_dataset_v0

h: 62.122.20.14 u: faradai key: faradai_ventspils.key (atsevišķi nosūtīta)

Github Repo, kur jābūt visam kodam: https://github.com/asya-ai/faradai-t6-2-super-resolution

Daži modeļi un metrikas jau ir implementēti.

Provizoriskie rezultāti

RealESRGAN, NIQE score: 8.9769 CRAFT_SR, NIQE score: 7.2965

Zemāks rādītājs norāda uz labāku attēla kvalitāti

RVvoenkor_70014_6e31d3a6-0284-4337-922d-2a1c64245af1.mp4_001_1

Uzdevumi

Ar military YOLO detector attīrīt kadrus, kuros nav nekādas militārās vienības
Izveidot validācijas kopu 10% apmērā, kur būtu pārstēvētas visas miltārā tipa vienības (JSON failos, katram video nāk līdzi vismaz viens marķēts fails, sākotnēji no katra video tika iegūti 3 kadri, kas tika marķēti).
Izvēlēties labāko rādītāju ar kuru salīdzināt, varbūt kāda nav uzskaitīta
1. NIQE (Naturalness Image Quality Evaluator): No-reference metric assessing perceptual quality based on statistical naturalness of images, lower scores indicating more natural appearance. NIQE assesses how “natural” an image looks by modeling statistical regularities of natural images. It computes localized statistical features (from mean-subtracted, contrast-normalized pixels) and compares them to a pre-trained multivariate Gaussian model of natural image patches[basicsr.readthedocs.io](https://basicsr.readthedocs.io/en/latest/api/basicsr.metrics.__init__.html#:~:text=basicsr.metrics.__init__.calculate_niqe(img%2C crop_border%2C input_order%3D'HWC'%2C convert_to%3D'y'%2C ,7)basicsr.readthedocs.io. Lower NIQE indicates more natural, undistorted imagery. NIQE is completely blind (no training on human ratings)basicsr.readthedocs.io. It is often used to evaluate GAN-based SR on real images – e.g. Real-ESRGAN achieves NIQE ≈ 6.48 on Set5 (4×), slightly better (lower) than bicubic upscaling and previous GAN baselinespmc.ncbi.nlm.nih.gov. NIQE is appropriate to measure naturalness but may not align perfectly with human perception for all distortions.
2. Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE): No-reference BRISQUE is a learned NR metric that uses statistical features of locally normalized luminance coefficientssciencedirect.com. It fits distributions (e.g. AGGD) to image patch coefficients and uses features like variance, shape, and mean of these distributions to predict a “quality score” via a trained SVR regressorlive.ece.utexas.edu pmc.ncbi.nlm.nih.gov. Lower BRISQUE scores correspond to higher perceived qualityyoutube.com). BRISQUE is trained on human opinion scores of distorted images, making it effective for typical distortions (blur, noise, compression). For example, a super-resolved image with fewer artifacts will yield a lower BRISQUE. While BRISQUE is less used in recent SR papers, it remains relevant for classical NR-IQA evaluation of SR results.
3. Perceptual Index (PI): No-reference The PI is a composite perceptual quality metric introduced in the PIRM 2018 SR challenge to balance fidelity and naturalnessarxiv.org. It is defined as:
  $\text{PI} = \frac{1}{2}\Big[(10 - \text{Ma}) + \text{NIQE}\Big],$
  where “Ma” is the Ma et al. learned no-reference quality score (0–10, higher = better) and NIQE is as abovearxiv.org. Lower PI means better perceptual quality (fewer distortions and more natural appearance)researchgate.net. PI was used to rank perceptual SR GANs – for instance, ESRGAN obtained PI ≈2.35, outperforming prior methodsarxiv.org. Real-ESRGAN and similar methods aim to minimize PI: they often slightly increase NIQE (due to GAN texture addition) but greatly improve Ma-score, yielding a low PIarxiv.org. PI is most appropriate when optimizing for human opinion: it correlates fairly well with mean opinion scores of SR results
4. Newer no-reference methods (e.g. SUPIR) employ learned NR-IQA models like MaNIQA, CLIP-IQA, or MUSIQ that use deep neural networks to predict qualityopenaccess.thecvf.com. These metrics often correlate better with human perception on diverse content. For example, SUPIR’s outputs have the best CLIP-IQA and MUSIQ scores among contemporary modelsopenaccess.thecvf.com, indicating very natural outputs, even though classical NIQE/PI might not fully capture their improvements. Such metrics are gaining traction for evaluating high-level perceptual quality when ground truth is unavailable.
5. PSNR (Peak Signal-to-Noise Ratio): Reference-based metric assessing fidelity of super-resolved images by pixel-wise comparison.
  $I_{\text{SR}}$ $I_{\text{HR}}$ . It is defined as:
  $\text{PSNR} = 10 \log_{10}\!\Big(\frac{MAX^2}{\text{MSE}(I_{\text{SR}},\,I_{\text{HR}})}\Big),$
  $MAX$ is the maximum possible pixel value (255 for 8-bit images) and MSE is the mean squared errorpmc.ncbi.nlm.nih.gov. Higher PSNR indicates lower distortion, so methods targeting fidelity (e.g. SwinIR) excel in PSNRgithub.com. PSNR is most appropriate when the goal is to preserve exact pixel values (e.g. in benchmark settings with bicubic downscaling), but it correlates poorly with human perception of quality for large texture differencesopenaccess.thecvf.com Our method treats).
6. SSIM (Structural Similarity Index): Reference-based metric measuring similarity in structural information between super-resolved images and original high-resolution images.
  $I_{\text{SR}}$ $I_{\text{HR}}$ . A common form (for windows of the image) is:
  $\text{SSIM}(x,y) = \frac{(2\mu_x \mu_y + C_1)(2\sigma_{xy}+C_2)}{(\mu_x^2+\mu_y^2+C_1)(\sigma_x^2+\sigma_y^2+C_2)},$
  $\mu_x,\mu_y$ $\sigma_x^2,\sigma_y^2$ $\sigma_{xy}$ $C_1,C_2$ are small constantspmc.ncbi.nlm.nih.gov. SSIM ranges from –1 to 1 (1 = perfect structural similarity). It is higher when the SR result preserves structural features (edges, textures) similarly to the ground truth. SSIM is often reported alongside PSNR to gauge structural fidelity – for example, SwinIR achieves SSIM ≈0.903 on Set5 (4×). SSIM is appropriate when structural/visual similarity is important, though like PSNR it favors smoothed results over perceptually sharp ones
7. Learned Perceptual Image Patch Similarity (LPIPS):full-reference perceptual metric $I_{\text{SR}}$ $I_{\text{HR}}$ . It computes feature representations (e.g. from a pretrained VGG network) for both images and calculates the L2 distance, often with learned linear weights per channelopenaccess.thecvf.com openaccess.thecvf.com $f_l(\cdot)$ $l$ , LPIPS is:
  $\text{LPIPS}(I_{\text{SR}},I_{\text{HR}}) = \sum_{l} \frac{1}{H_lW_l}\sum_{h,w} \| w_l \odot (f_l(I_{\text{SR}})_{h,w} - f_l(I_{\text{HR}})_{h,w}) \|_2^2,$
  $w_l$ learned to align with human judgmentsopenaccess.thecvf.com. Lower LPIPS means closer perceptual quality to the reference. Unlike PSNR/SSIM, LPIPS penalizes differences in texture or perceptual content rather than pixel-by-pixel error. It is most appropriate for GAN/diffusion-based SR models that sacrifice pixel fidelity for realism. For instance, GFPGAN (a GAN for faces) drastically lowers LPIPS (0.3646, vs ~0.48 for prior methods) at the cost of a slight PSNR dropopenaccess.thecvf.com openaccess.thecvf.com. LPIPS is often used in perceptual SR competitions to complement or replace SSIM/PSNR.
8. Feature Similarity (FSIM) or Visual Information Fidelity (VIF). For example, SwinIR reported a Visual Information Fidelity of ~0.47 on a test setresearchgate.net. These metrics also compare features of the reference and SR image: FSIM uses phase congruency and gradient magnitude, and VIF measures information loss. They can provide nuanced evalu ations but are less common than PSNR/SSIM/LPIPS in recent SR benchmarks.
Izpētīt un implementēt, ja ir pieejami Watermarks noņemoši modeļi, jo mūsu datu kopā ir daudz watermarks virsū. Daži no risinājumiem izmanto In-Painting, pārbaudīt vai nav jauni modeļi ar kuriem varētu iegūt šādus rezultātus.
1. WDNet (2021)
  1. GitHub: https://github.com/MRUIL/WDNet
  2. Hugging Face: Not available
2. Split then Refine (2021)
  1. GitHub: https://github.com/vinthony/deep-blind-watermark-removal
  2. Hugging Face: Not available
3. SLBR (2021)
  1. GitHub: https://github.com/bcmi/SLBR-Visible-Watermark-Removal
  2. Hugging Face: Not available
4. Blind Visual Motif Removal (2019)
  1. GitHub: https://github.com/amirhertz/visual_motif_removal
  2. Hugging Face: Not available
5. WatermarkRemover-AI (2023)
  1. GitHub: https://github.com/D-Ogi/WatermarkRemover-AI
  2. Hugging Face: Not available
6. FODUU Watermark Removal (2025)
  1. GitHub: Not available
  2. Hugging Face: https://huggingface.co/foduucom/Watermark_Removal
Impelementēt atvērtā koda modeļus:
1. Toolbox https://github.com/xinntao/EDVR/tree/master
2. Real-ESRGAN
  - GitHub: https://github.com/xinntao/Real-ESRGAN
  - Hugging Face: https://huggingface.co/ai-forever/Real-ESRGAN
3. SwinIR
  - GitHub: https://github.com/JingyunLiang/SwinIR
  - Hugging Face: https://huggingface.co/papers/2108.10257
4. BasicSR (includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR)
  - GitHub: https://github.com/XPixelGroup/BasicSR
  - Hugging Face: No official repository, available via community projects
5. GFPGAN
  - GitHub: https://github.com/TencentARC/GFPGAN
  - Hugging Face: https://huggingface.co/spaces/Xintao/GFPGAN
6. LIIF
  - GitHub: https://github.com/yinboc/liif
  - Hugging Face: No official repository, available via community projects
7. SUPIR
  - GitHub: https://github.com/Fanghua-Yu/SUPIR
  - Hugging Face: https://huggingface.co/camenduru/SUPIR
8. HAT (Hybrid Attention Transformer) (2023)
  - GitHub Repository
  - Hugging Face (mirror of official models)
9. DAT (Dual Aggregation Transformer) (2023)
  - GitHub Repository
  - (No official Hugging Face repository yet.)
10. CRAFT (Cross-Refinement Adaptive Feature Transformer) (2023)
  - GitHub Repository
  - (No official Hugging Face repository yet.)
11. Satlas ESRGAN (AllenAI Satlas) (2023)
  - GitHub Repository
  - Hugging Face (AllenAI models and datasets)
12. SinSR (Single-Step Diffusion SR) (2024)
  - GitHub Repository
  - (No official Hugging Face repository yet.)
13. Latent Diffusion Super-Resolution (LDM-SR) (2022)
  - GitHub: https://github.com/CompVis/latent-diffusion
  - Hugging Face: https://huggingface.co/CompVis/ldm-super-resolution-4x-openimages
14. InstantIR (2024)
  - GitHub: https://github.com/cientgu/InstantIR
  - Hugging Face: https://huggingface.co/spaces/InstantX/InstantIR
15. MAXIM (2022)
  - GitHub: https://github.com/google-research/maxim
  - Hugging Face: https://huggingface.co/google/maxim
16. Restormer (2022)
  - GitHub: https://github.com/swz30/Restormer
  - Hugging Face: https://huggingface.co/spaces/skytnt/restormer
17. Uformer (2022)
  - GitHub: https://github.com/ZhendongWang6/Uformer
  - Hugging Face: Not available
Izvēlēties labāko modeli, apmācīt ar 90% datiem (80% train/test split), pārbaudīt augmentāciju efektu
Uzlabot modeli, dokumentēt uzlabojumus

Potenciālie rezultāti

Table 1 – 4× Super-Resolution Performance on Standard Datasets (PSNR in dB, SSIM)pmc.ncbi.nlm.nih.gov github.com mdpi.com mdpi.com:

Model	Set5 (PSNR / SSIM)	Set14 (PSNR / SSIM)	BSD100 (PSNR / SSIM)	Urban100 (PSNR / SSIM)
Real-ESRGAN Wang et al. 2021 – Enhanced ESRGAN for real images	32.12 / 0.9116pmc.ncbi.nlm.nih.gov	29.33 / 0.7901pmc.ncbi.nlm.nih.gov	29.63 / 0.8587pmc.ncbi.nlm.nih.gov	28.86 / 0.8446pmc.ncbi.nlm.nih.gov
SwinIR Liang et al. 2021 – Swin Transformer for SR	32.72 / 0.903github.com	28.94 / 0.791github.com	27.83 / 0.746github.com	27.07 / 0.816github.com
LIIF Chen et al. 2021 – Local Implicit Function (w/ RDN backbone)	32.50 / 0.8988mdpi.com mdpi.com	28.80 / 0.7875mdpi.com	27.74 / 0.7420mdpi.com	26.68 / 0.8039mdpi.com
GFPGAN Wang et al. 2021 – GAN for face SR (mainly evaluates on face data)	N/A (see below)	N/A	N/A	N/A