2025-05-05 FaradAI Military Super-Resolution Task T6.2

Apraksts

Izmantojot datu kopu super_resolution_dataset (uzmanīgi, nejauši neaiztikt citus failus), nomērīt rezultātu, palielinot izšķirtspēju no 320p, 720p uz 1080p tieši militāriem transportlīdzekļiem ar jau publiski pieejamiem modeļiem. Izveidot jaunu modeli un salīdzināt. Plānots notestēt arī attēlu kvalitātes uzlabošanas modeļus kā debulurring utt.

Path: /storage/telegram_war_videos_2024/high_res_dataset_v0

h: 62.122.20.14 u: faradai key: faradai_ventspils.key (atsevišķi nosūtīta)

Github Repo, kur jābūt visam kodam: https://github.com/asya-ai/faradai-t6-2-super-resolution

Daži modeļi un metrikas jau ir implementēti.

 

Provizoriskie rezultāti

RealESRGAN, NIQE score: 8.9769 CRAFT_SR, NIQE score: 7.2965

Zemāks rādītājs norāda uz labāku attēla kvalitāti

 

CleanShot 2025-05-05 at 23.47.32@2x

CleanShot 2025-05-05 at 23.51.12@2x

 

  
RVvoenkor_70014_6e31d3a6-0284-4337-922d-2a1c64245af1.mp4_001_1RVvoenkor_70014_6e31d3a6-0284-4337-922d-2a1c64245af1.mp4_001_1
RVvoenkor_69317_451bbe71-3a35-4d63-a622-fe804e618179.mp4_001_0RVvoenkor_69317_451bbe71-3a35-4d63-a622-fe804e618179.mp4_001_0
RVvoenkor_81287_2326c5d0-6973-4058-8269-fa77af030cbf.MOV_001_0RVvoenkor_81287_2326c5d0-6973-4058-8269-fa77af030cbf.MOV_001_0

 

Uzdevumi

  1. Ar military YOLO detector attīrīt kadrus, kuros nav nekādas militārās vienības

  2. Izveidot validācijas kopu 10% apmērā, kur būtu pārstēvētas visas miltārā tipa vienības (JSON failos, katram video nāk līdzi vismaz viens marķēts fails, sākotnēji no katra video tika iegūti 3 kadri, kas tika marķēti).

  3. Izvēlēties labāko rādītāju ar kuru salīdzināt, varbūt kāda nav uzskaitīta

    1. NIQE (Naturalness Image Quality Evaluator): No-reference metric assessing perceptual quality based on statistical naturalness of images, lower scores indicating more natural appearance. NIQE assesses how “natural” an image looks by modeling statistical regularities of natural images. It computes localized statistical features (from mean-subtracted, contrast-normalized pixels) and compares them to a pre-trained multivariate Gaussian model of natural image patches[basicsr.readthedocs.io](https://basicsr.readthedocs.io/en/latest/api/basicsr.metrics.__init__.html#:~:text=basicsr.metrics.__init__.calculate_niqe(img%2C crop_border%2C input_order%3D'HWC'%2C convert_to%3D'y'%2C ,7)basicsr.readthedocs.io. Lower NIQE indicates more natural, undistorted imagery. NIQE is completely blind (no training on human ratings)basicsr.readthedocs.io. It is often used to evaluate GAN-based SR on real images – e.g. Real-ESRGAN achieves NIQE ≈ 6.48 on Set5 (4×), slightly better (lower) than bicubic upscaling and previous GAN baselinespmc.ncbi.nlm.nih.gov. NIQE is appropriate to measure naturalness but may not align perfectly with human perception for all distortions.

    2. Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE): No-reference BRISQUE is a learned NR metric that uses statistical features of locally normalized luminance coefficientssciencedirect.com. It fits distributions (e.g. AGGD) to image patch coefficients and uses features like variance, shape, and mean of these distributions to predict a “quality score” via a trained SVR regressorlive.ece.utexas.edupmc.ncbi.nlm.nih.gov. Lower BRISQUE scores correspond to higher perceived qualityyoutube.com). BRISQUE is trained on human opinion scores of distorted images, making it effective for typical distortions (blur, noise, compression). For example, a super-resolved image with fewer artifacts will yield a lower BRISQUE. While BRISQUE is less used in recent SR papers, it remains relevant for classical NR-IQA evaluation of SR results.

    3. Perceptual Index (PI): No-reference The PI is a composite perceptual quality metric introduced in the PIRM 2018 SR challenge to balance fidelity and naturalnessarxiv.org. It is defined as:

      PI=12[(10Ma)+NIQE],

      where “Ma” is the Ma et al. learned no-reference quality score (0–10, higher = better) and NIQE is as abovearxiv.org. Lower PI means better perceptual quality (fewer distortions and more natural appearance)researchgate.net. PI was used to rank perceptual SR GANs – for instance, ESRGAN obtained PI ≈2.35, outperforming prior methodsarxiv.org. Real-ESRGAN and similar methods aim to minimize PI: they often slightly increase NIQE (due to GAN texture addition) but greatly improve Ma-score, yielding a low PIarxiv.org. PI is most appropriate when optimizing for human opinion: it correlates fairly well with mean opinion scores of SR results

    4. Newer no-reference methods (e.g. SUPIR) employ learned NR-IQA models like MaNIQA, CLIP-IQA, or MUSIQ that use deep neural networks to predict qualityopenaccess.thecvf.com. These metrics often correlate better with human perception on diverse content. For example, SUPIR’s outputs have the best CLIP-IQA and MUSIQ scores among contemporary modelsopenaccess.thecvf.com, indicating very natural outputs, even though classical NIQE/PI might not fully capture their improvements. Such metrics are gaining traction for evaluating high-level perceptual quality when ground truth is unavailable.

    5. PSNR (Peak Signal-to-Noise Ratio): Reference-based metric assessing fidelity of super-resolved images by pixel-wise comparison.

      PSNR measures the pixel-wise fidelity between the super-resolved image ISR and the ground truth high-resolution image IHR. It is defined as:

      PSNR=10log10(MAX2MSE(ISR,IHR)),

      where MAX is the maximum possible pixel value (255 for 8-bit images) and MSE is the mean squared errorpmc.ncbi.nlm.nih.gov. Higher PSNR indicates lower distortion, so methods targeting fidelity (e.g. SwinIR) excel in PSNRgithub.com. PSNR is most appropriate when the goal is to preserve exact pixel values (e.g. in benchmark settings with bicubic downscaling), but it correlates poorly with human perception of quality for large texture differencesopenaccess.thecvf.com Our method treats).

    6. SSIM (Structural Similarity Index): Reference-based metric measuring similarity in structural information between super-resolved images and original high-resolution images.

      Structural Similarity Index (SSIM): SSIM evaluates perceived image quality by comparing structural information (luminance, contrast, and structure) between ISR and IHR. A common form (for windows of the image) is:

      SSIM(x,y)=(2μxμy+C1)(2σxy+C2)(μx2+μy2+C1)(σx2+σy2+C2),

      where μx,μy are mean intensities, σx2,σy2 variances, and σxy covariance for the window; C1,C2 are small constantspmc.ncbi.nlm.nih.gov. SSIM ranges from –1 to 1 (1 = perfect structural similarity). It is higher when the SR result preserves structural features (edges, textures) similarly to the ground truth. SSIM is often reported alongside PSNR to gauge structural fidelity – for example, SwinIR achieves SSIM ≈0.903 on Set5 (4×). SSIM is appropriate when structural/visual similarity is important, though like PSNR it favors smoothed results over perceptually sharp ones

       

    7. Learned Perceptual Image Patch Similarity (LPIPS): LPIPS is a full-reference perceptual metric that measures the deep feature difference between ISR and IHR. It computes feature representations (e.g. from a pretrained VGG network) for both images and calculates the L2 distance, often with learned linear weights per channelopenaccess.thecvf.comopenaccess.thecvf.com. Formally, if fl() are feature maps from layer l, LPIPS is:

      LPIPS(ISR,IHR)=l1HlWlh,wwl(fl(ISR)h,wfl(IHR)h,w)22,

      with channel-wise weights wl learned to align with human judgmentsopenaccess.thecvf.com. Lower LPIPS means closer perceptual quality to the reference. Unlike PSNR/SSIM, LPIPS penalizes differences in texture or perceptual content rather than pixel-by-pixel error. It is most appropriate for GAN/diffusion-based SR models that sacrifice pixel fidelity for realism. For instance, GFPGAN (a GAN for faces) drastically lowers LPIPS (0.3646, vs ~0.48 for prior methods) at the cost of a slight PSNR dropopenaccess.thecvf.comopenaccess.thecvf.com. LPIPS is often used in perceptual SR competitions to complement or replace SSIM/PSNR.

    8. Feature Similarity (FSIM) or Visual Information Fidelity (VIF). For example, SwinIR reported a Visual Information Fidelity of ~0.47 on a test setresearchgate.net. These metrics also compare features of the reference and SR image: FSIM uses phase congruency and gradient magnitude, and VIF measures information loss. They can provide nuanced evalu ations but are less common than PSNR/SSIM/LPIPS in recent SR benchmarks.

  4. Izpētīt un implementēt, ja ir pieejami Watermarks noņemoši modeļi, jo mūsu datu kopā ir daudz watermarks virsū. Daži no risinājumiem izmanto In-Painting, pārbaudīt vai nav jauni modeļi ar kuriem varētu iegūt šādus rezultātus.

    1. WDNet (2021)

      1. GitHub: https://github.com/MRUIL/WDNet

      2. Hugging Face: Not available

    2. Split then Refine (2021)

      1. GitHub: https://github.com/vinthony/deep-blind-watermark-removal

      2. Hugging Face: Not available

    3. SLBR (2021)

      1. GitHub: https://github.com/bcmi/SLBR-Visible-Watermark-Removal

      2. Hugging Face: Not available

    4. Blind Visual Motif Removal (2019)

      1. GitHub: https://github.com/amirhertz/visual_motif_removal

      2. Hugging Face: Not available

    5. WatermarkRemover-AI (2023)

      1. GitHub: https://github.com/D-Ogi/WatermarkRemover-AI

      2. Hugging Face: Not available

    6. FODUU Watermark Removal (2025)

      1. GitHub: Not available

      2. Hugging Face: https://huggingface.co/foduucom/Watermark_Removal

     

     

  5. Impelementēt atvērtā koda modeļus:

    1. Toolbox https://github.com/xinntao/EDVR/tree/master

    2. Real-ESRGAN

    3. SwinIR

    4. BasicSR (includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR)

    5. GFPGAN

    6. LIIF

    7. SUPIR

    8. HAT (Hybrid Attention Transformer) (2023)

    9. DAT (Dual Aggregation Transformer) (2023)

    10. CRAFT (Cross-Refinement Adaptive Feature Transformer) (2023)

    11. Satlas ESRGAN (AllenAI Satlas) (2023)

    12. SinSR (Single-Step Diffusion SR) (2024)

    13. Latent Diffusion Super-Resolution (LDM-SR) (2022)

    14. InstantIR (2024)

    15. MAXIM (2022)

    16. Restormer (2022)

    17. Uformer (2022)

  6. Izvēlēties labāko modeli, apmācīt ar 90% datiem (80% train/test split), pārbaudīt augmentāciju efektu

  7. Uzlabot modeli, dokumentēt uzlabojumus

 


 

Potenciālie rezultāti

Table 1 – 4× Super-Resolution Performance on Standard Datasets (PSNR in dB, SSIM)pmc.ncbi.nlm.nih.govgithub.commdpi.commdpi.com:

ModelSet5 (PSNR / SSIM)Set14 (PSNR / SSIM)BSD100 (PSNR / SSIM)Urban100 (PSNR / SSIM)
Real-ESRGAN Wang et al. 2021 – Enhanced ESRGAN for real images32.12 / 0.9116pmc.ncbi.nlm.nih.gov29.33 / 0.7901pmc.ncbi.nlm.nih.gov29.63 / 0.8587pmc.ncbi.nlm.nih.gov28.86 / 0.8446pmc.ncbi.nlm.nih.gov
SwinIR Liang et al. 2021 – Swin Transformer for SR32.72 / 0.903github.com28.94 / 0.791github.com27.83 / 0.746github.com27.07 / 0.816github.com
LIIF Chen et al. 2021 – Local Implicit Function (w/ RDN backbone)32.50 / 0.8988mdpi.commdpi.com28.80 / 0.7875mdpi.com27.74 / 0.7420mdpi.com26.68 / 0.8039mdpi.com
GFPGAN Wang et al. 2021 – GAN for face SR (mainly evaluates on face data)N/A (see below)N/AN/AN/A