2025-05-05 FaradAI Military Super-Resolution Task T6.2AprakstsProvizoriskie rezultātiUzdevumiPotenciālie rezultāti
Izmantojot datu kopu super_resolution_dataset (uzmanīgi, nejauši neaiztikt citus failus), nomērīt rezultātu, palielinot izšķirtspēju no 320p, 720p uz 1080p tieši militāriem transportlīdzekļiem ar jau publiski pieejamiem modeļiem. Izveidot jaunu modeli un salīdzināt. Plānots notestēt arī attēlu kvalitātes uzlabošanas modeļus kā debulurring utt.
Path: /storage/telegram_war_videos_2024/high_res_dataset_v0
h: 62.122.20.14 u: faradai key: faradai_ventspils.key (atsevišķi nosūtīta)
Github Repo, kur jābūt visam kodam: https://github.com/asya-ai/faradai-t6-2-super-resolution
Daži modeļi un metrikas jau ir implementēti.
RealESRGAN, NIQE score: 8.9769 CRAFT_SR, NIQE score: 7.2965
Zemāks rādītājs norāda uz labāku attēla kvalitāti
![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
Ar military YOLO detector attīrīt kadrus, kuros nav nekādas militārās vienības
Izveidot validācijas kopu 10% apmērā, kur būtu pārstēvētas visas miltārā tipa vienības (JSON failos, katram video nāk līdzi vismaz viens marķēts fails, sākotnēji no katra video tika iegūti 3 kadri, kas tika marķēti).
Izvēlēties labāko rādītāju ar kuru salīdzināt, varbūt kāda nav uzskaitīta
NIQE (Naturalness Image Quality Evaluator): No-reference metric assessing perceptual quality based on statistical naturalness of images, lower scores indicating more natural appearance. NIQE assesses how “natural” an image looks by modeling statistical regularities of natural images. It computes localized statistical features (from mean-subtracted, contrast-normalized pixels) and compares them to a pre-trained multivariate Gaussian model of natural image patches[basicsr.readthedocs.io](https://basicsr.readthedocs.io/en/latest/api/basicsr.metrics.__init__.html#:~:text=basicsr.metrics.__init__.calculate_niqe(img%2C crop_border%2C input_order%3D'HWC'%2C convert_to%3D'y'%2C ,7)basicsr.readthedocs.io. Lower NIQE indicates more natural, undistorted imagery. NIQE is completely blind (no training on human ratings)basicsr.readthedocs.io. It is often used to evaluate GAN-based SR on real images – e.g. Real-ESRGAN achieves NIQE ≈ 6.48 on Set5 (4×), slightly better (lower) than bicubic upscaling and previous GAN baselinespmc.ncbi.nlm.nih.gov. NIQE is appropriate to measure naturalness but may not align perfectly with human perception for all distortions.
Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE): No-reference BRISQUE is a learned NR metric that uses statistical features of locally normalized luminance coefficientssciencedirect.com. It fits distributions (e.g. AGGD) to image patch coefficients and uses features like variance, shape, and mean of these distributions to predict a “quality score” via a trained SVR regressorlive.ece.utexas.edupmc.ncbi.nlm.nih.gov. Lower BRISQUE scores correspond to higher perceived qualityyoutube.com). BRISQUE is trained on human opinion scores of distorted images, making it effective for typical distortions (blur, noise, compression). For example, a super-resolved image with fewer artifacts will yield a lower BRISQUE. While BRISQUE is less used in recent SR papers, it remains relevant for classical NR-IQA evaluation of SR results.
Perceptual Index (PI): No-reference The PI is a composite perceptual quality metric introduced in the PIRM 2018 SR challenge to balance fidelity and naturalnessarxiv.org. It is defined as:
where “Ma” is the Ma et al. learned no-reference quality score (0–10, higher = better) and NIQE is as abovearxiv.org. Lower PI means better perceptual quality (fewer distortions and more natural appearance)researchgate.net. PI was used to rank perceptual SR GANs – for instance, ESRGAN obtained PI ≈2.35, outperforming prior methodsarxiv.org. Real-ESRGAN and similar methods aim to minimize PI: they often slightly increase NIQE (due to GAN texture addition) but greatly improve Ma-score, yielding a low PIarxiv.org. PI is most appropriate when optimizing for human opinion: it correlates fairly well with mean opinion scores of SR results
Newer no-reference methods (e.g. SUPIR) employ learned NR-IQA models like MaNIQA, CLIP-IQA, or MUSIQ that use deep neural networks to predict qualityopenaccess.thecvf.com. These metrics often correlate better with human perception on diverse content. For example, SUPIR’s outputs have the best CLIP-IQA and MUSIQ scores among contemporary modelsopenaccess.thecvf.com, indicating very natural outputs, even though classical NIQE/PI might not fully capture their improvements. Such metrics are gaining traction for evaluating high-level perceptual quality when ground truth is unavailable.
PSNR (Peak Signal-to-Noise Ratio): Reference-based metric assessing fidelity of super-resolved images by pixel-wise comparison.
PSNR measures the pixel-wise fidelity between the super-resolved image
where
SSIM (Structural Similarity Index): Reference-based metric measuring similarity in structural information between super-resolved images and original high-resolution images.
Structural Similarity Index (SSIM): SSIM evaluates perceived image quality by comparing structural information (luminance, contrast, and structure) between
where
Learned Perceptual Image Patch Similarity (LPIPS): LPIPS is a full-reference perceptual metric that measures the deep feature difference between
with channel-wise weights
Feature Similarity (FSIM) or Visual Information Fidelity (VIF). For example, SwinIR reported a Visual Information Fidelity of ~0.47 on a test setresearchgate.net. These metrics also compare features of the reference and SR image: FSIM uses phase congruency and gradient magnitude, and VIF measures information loss. They can provide nuanced evalu ations but are less common than PSNR/SSIM/LPIPS in recent SR benchmarks.
Izpētīt un implementēt, ja ir pieejami Watermarks noņemoši modeļi, jo mūsu datu kopā ir daudz watermarks virsū. Daži no risinājumiem izmanto In-Painting, pārbaudīt vai nav jauni modeļi ar kuriem varētu iegūt šādus rezultātus.
WDNet (2021)
GitHub: https://github.com/MRUIL/WDNet
Hugging Face: Not available
Split then Refine (2021)
GitHub: https://github.com/vinthony/deep-blind-watermark-removal
Hugging Face: Not available
SLBR (2021)
GitHub: https://github.com/bcmi/SLBR-Visible-Watermark-Removal
Hugging Face: Not available
Blind Visual Motif Removal (2019)
Hugging Face: Not available
WatermarkRemover-AI (2023)
Hugging Face: Not available
FODUU Watermark Removal (2025)
GitHub: Not available
Hugging Face: https://huggingface.co/foduucom/Watermark_Removal
Impelementēt atvērtā koda modeļus:
Real-ESRGAN
Hugging Face: https://huggingface.co/ai-forever/Real-ESRGAN
SwinIR
Hugging Face: https://huggingface.co/papers/2108.10257
BasicSR (includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR)
Hugging Face: No official repository, available via community projects
GFPGAN
Hugging Face: https://huggingface.co/spaces/Xintao/GFPGAN
LIIF
GitHub: https://github.com/yinboc/liif
Hugging Face: No official repository, available via community projects
SUPIR
Hugging Face: https://huggingface.co/camenduru/SUPIR
HAT (Hybrid Attention Transformer) (2023)
Hugging Face (mirror of official models)
DAT (Dual Aggregation Transformer) (2023)
(No official Hugging Face repository yet.)
CRAFT (Cross-Refinement Adaptive Feature Transformer) (2023)
(No official Hugging Face repository yet.)
Satlas ESRGAN (AllenAI Satlas) (2023)
Hugging Face (AllenAI models and datasets)
SinSR (Single-Step Diffusion SR) (2024)
(No official Hugging Face repository yet.)
Latent Diffusion Super-Resolution (LDM-SR) (2022)
InstantIR (2024)
Hugging Face: https://huggingface.co/spaces/InstantX/InstantIR
MAXIM (2022)
Hugging Face: https://huggingface.co/google/maxim
Restormer (2022)
Hugging Face: https://huggingface.co/spaces/skytnt/restormer
Uformer (2022)
Hugging Face: Not available
Izvēlēties labāko modeli, apmācīt ar 90% datiem (80% train/test split), pārbaudīt augmentāciju efektu
Uzlabot modeli, dokumentēt uzlabojumus
Table 1 – 4× Super-Resolution Performance on Standard Datasets (PSNR in dB, SSIM)pmc.ncbi.nlm.nih.govgithub.commdpi.commdpi.com:
Model | Set5 (PSNR / SSIM) | Set14 (PSNR / SSIM) | BSD100 (PSNR / SSIM) | Urban100 (PSNR / SSIM) |
---|---|---|---|---|
Real-ESRGAN Wang et al. 2021 – Enhanced ESRGAN for real images | 32.12 / 0.9116pmc.ncbi.nlm.nih.gov | 29.33 / 0.7901pmc.ncbi.nlm.nih.gov | 29.63 / 0.8587pmc.ncbi.nlm.nih.gov | 28.86 / 0.8446pmc.ncbi.nlm.nih.gov |
SwinIR Liang et al. 2021 – Swin Transformer for SR | 32.72 / 0.903github.com | 28.94 / 0.791github.com | 27.83 / 0.746github.com | 27.07 / 0.816github.com |
LIIF Chen et al. 2021 – Local Implicit Function (w/ RDN backbone) | 32.50 / 0.8988mdpi.commdpi.com | 28.80 / 0.7875mdpi.com | 27.74 / 0.7420mdpi.com | 26.68 / 0.8039mdpi.com |
GFPGAN Wang et al. 2021 – GAN for face SR (mainly evaluates on face data) | N/A (see below) | N/A | N/A | N/A |