2025-Q1-AI 10. UNet, Semantic Segmentation, Object Detection

 

11.1. Video / Materiāli (23. aprīlis trešdiena 18:00)

Zoom (Evalds ielaidīs Zoom):

https://zoom.us/j/3167417956?pwd=Q2NoNWp2a3M2Y2hRSHBKZE1Wcml4Zz09

Whiteboard: https://www.figma.com/board/TgCeSmvMY8EE9ad0xAtKYU/2025-Q1-AI-10.-UNet--Semantic-Segmentation--Object-Detection?node-id=1-3&t=ki4pozWElf61D3X8-1

Sagatavošanās materiāli:

 

 

Video no pagājušā gada: https://youtube.com/live/VV6Q2KM9zOY?feature=share

Source Code pabeigts: http://share.yellowrobot.xyz/quick/2023-11-16-151E1E92-44FA-47BD-9500-0FA4CA2A6356.zip

YOLO piemērs

https://share.yellowrobot.xyz/quick/2025-4-21-97299DB9-C7F3-4545-9223-290D25B75BAA.zip

 

image-20250415205058380

 

Iepriekšējā gada video un jamboard

Video:

https://youtu.be/um1U66VDd5M

Jamboard:

https://jamboard.google.com/d/1BAwk5fRda0dWjI4sDVzIORKLsuqfIIC4niuBJG56_tc/edit?usp=sharing

Saturs

1.Izstāstīt par segmentāciju tipiem

Tipi:

  1. Image recognition / classification ar slīdošo logu

  2. Semantic segmentation (FCN, DeepLab, UNet) image-20250421092301857

    image-20250421093128623

    image-20250421093148267

     

  3. Object Detection (YOLO)

  4. Instance Segmentation (MaskRCNN) image-20250421093103926

  5. Panatopic segmentation (MaskFormer, PQ metric) https://github.com/sithu31296/panoptic-segmentation

Panoptic segmentation offers key advantages over instance segmentation (e.g., Mask R-CNN) by providing complete scene understanding through unified semantic and instance labeling. Here's how it improves upon traditional instance segmentation:

FeaturePanoptic SegmentationInstance Segmentation (Mask R-CNN)
ScopeLabels every pixelFocuses only on countable objects
OutputCombines semantic + instance IDsOnly instance masks + class labels
Ambiguity HandlingResolves overlaps (no conflicts)Allows overlapping masks
"Stuff" HandlingLabels amorphous regions (sky, road)Ignores non-object regions
Scene ComprehensionFull context awarenessPartial object-focused understanding

image-20250421092103299

image-20250421093027294

2.Izstāstīt par metrikām

F1, mAP (10%..100%), AP70 (70% threshold), IoU, PQ, DICE

 

The Dice Coefficient (Jha et al. 2021a; Shamir et al. 2019) is a commonly used statistic for comparing the pixel-by-pixel outcomes of forecasted segmentation with ground truth. It is described as:

(76)DSC(A,B)=2×|AB||A|+|B|=2×TP(2×TP)+FP+FN

 

Intersection over union (IoU) There is a popular statistic in polyp segmentation known as intersection-over-union (IoU). The IoU metric measures the number of pixels common between the target and prediction masks divided by the total number of pixels present across both masks. It ranges from 0 to 1 where, a value of zero indicates that there is no overlap, while a value of one indicates flawless overlap. An average of the IoUs of each class is used to calculate the mean IoU of an image for binary segmentation (two classes) or multi-class segmentation. The overlap between two bounding boxes A and B is determined by calculating the ratio of their overlap areas (Rezatofighi et al. 2019).

(77)IoU(A,B)=ABAB=TPTP+FP+FN

 

Pixel accuracy A model’s accuracy parameter measures the model’s performance across a variety of classes. When all classes have equal significance, it is helpful. A prediction accuracy rate is calculated by dividing the number of accurate predictions by the number of predictions overall (Coleman et al. 2019). One alternative method of evaluating image segmentation is to simply report the percentage of pixels in the image that were correctly classified. Each class’s pixel accuracy is commonly reported separately, as well as on a global basis across all classes. A binary mask is used to evaluate the per-class pixel accuracy. True positives are pixels that are accurately predicted to belong to a specific class (according to the target mask), and true negatives are pixels that are accurately identified as not belonging to that class (Ye et al. 2018).

(78)PixelAccuracy=TP+TNTP+TN+FP+FN

 

Recall The recall is defined as the percentage of ground truth boundary pixels that were correctly identified by automatic segmentation. A recall is a measure of the proportion of Positive image samples accurately classified as Positive compared to the total number of Positive image samples. It is a measure of how well the model can identify positive samples. The higher the recall, the more positive samples are detected (Aguiar et al. 2019; Xu et al. 2019).

(79)Recall=TPTP+FN

 

Average precision Average precision is defined as the weighted mean of precisions achieved at each threshold, with the increase in recall as the weight for each threshold:

(80)Average Precision =n(RnRn1)Pn

where Pn and Rn are the precision and recall at the nth threshold. This implementation is not interpolated and is different from computing the area under the precision-recall curve with the trapezoidal rule, which uses linear interpolation and can be too optimistic (Perez-Borrero et al. 2021; De Moura Lima et al. 2023).

Panoptic Quality is the standard metric for panoptic segmentation, a task that unifies

The metric was introduced by Kirillov et al., CVPR 2019, “Panoptic Segmentation.”

(81)PQc=(p,g)TPcIoU(p,g)|TPc|+12(|FPc|+|FNc|)PQ=1NcPQc;
(82)SQc=(p,g)TPcIoU(p,g)|TPc|RQc=|TPc||TPc|+12(|FPc|+|FNc|)PQc=SQcRQc.

 

 

image-20250421092725453

image-20250421092713250

 

image-20250421093819904

 

 

https://link.springer.com/article/10.1007/s10462-023-10621-1

https://www.picsellia.com/post/coco-evaluation-metrics-explained

3.Izstāstīt par Semantic Segmentation / UNet

 

 

image-20250421091821518

 

image-20250421091529951

 

https://link.springer.com/article/10.1007/s10462-023-10621-1

 

4.Likt pašiem implementēt UNet ar concat un parādīt, ja nepieciešams

6.Likt pašiem implementēt UNet ar addition un parādīt, ja nepieciešams

 

8.YOLO uzdevums

 

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information.

CleanShot 2025-04-21 at 10.04.26

 

COCO Dataset

 

Training parameters https://docs.ultralytics.com/modes/train/#train-settings

CleanShot 2025-04-21 at 10.21.06

 

https://www.researchgate.net/figure/Pipeline-of-YOLOs-algorithm-12_fig1_350090136

 

image-20250421093934173

image-20250421094153813

image-20250421094253709

image-20250421094354564

 

Non-max surpression algorithm

image-20250421094432290

 

Oriented Bounding Boxes

Oriented object detection goes a step further than standard object detection by introducing an extra angle to locate objects more accurately in an image.

The output of an oriented object detector is a set of rotated bounding boxes that precisely enclose the objects in the image, along with class labels and confidence scores for each box. Oriented bounding boxes are particularly useful when objects appear at various angles, such as in aerial imagery, where traditional axis-aligned bounding boxes may include unnecessary background.

CleanShot 2025-04-21 at 10.16.12

https://docs.ultralytics.com/tasks/obb/

 


 

10.2. Implementēt UNet modeļa forward funkciju ar concat

Pirmkoda sagatave pieejama šeit: http://share.yellowrobot.xyz/quick/2023-11-16-C31604C9-53E7-49D2-8D21-B13FF215C139.zip

Implementēt UNet modeļa forward funkciju ar concat, Vienādojumi pieejami šeit:

(83)z1=conv1(x)z2=conv2(maxpool(z1))z3=conv3(maxpool(z2))z4=conv4(maxpool(z3))zmid=convmid(z4)u3=upsample(convu4([zmid,z4]))u2=upsample(convu3([u3,z3]))u1=upsample(convu2([u2,z2]))u1=σ(convu1([u1,z1]))

Iesniegt ekrānšāviņus un pirmkodu.

CleanShot 2025-04-21 at 10.01.02@2x

 


 

10.3. Implementēt UNet modeļa forward funkciju ar saskaitīšanu

Izmantot iepriekšējā uzdevuma sagatavi, Implementēt UNet modeļa forward funkciju ar saskaitīšanu. Nepieciešams arī izmainīt modeļa struktūru (kanālu skaitu).

Vienādojumi pieejami šeit:

(84)z1=conv1(x)z2=conv2(maxpool(z1))z3=conv3(maxpool(z2))z4=conv4(maxpool(z3))zmid=convmid(z4)u3=upsample(convu4(zmid+z4))u2=upsample(convu3(u3+z3))u1=upsample(convu2(u2+z2))u1=σ(convu1(u1+z1))

Iesniegt ekrānšāviņus un pirmkodu.

 


 

10.4. Implementēt UNet modeli ar LinearLayer pa vidu

Izmantot iepriekšējā uzdevuma sagatavi, Implementēt UNet modeļa forward funkciju ar LinearLayer pa vidu. Šāds modelis jāsagatavo tā, lai tas spētu strādāt ar iepriekš zināmu input attēla izmēru atšķiribā no FCN (Fully Convolutional Network) kādi bija iepriekšējie modeļi.

Vienādojumi pieejami šeit:

(85)z1=conv1(x)z2=conv2(maxpool(z1))z3=conv3(maxpool(z2))z4=conv4(maxpool(z3))zmid=reshape2d(linearmid(reshape1d(z4)))u3=upsample(convu4(zmid+z4))u2=upsample(convu3(u3+z3))u1=upsample(convu2(u2+z2))u1=σ(convu1(u1+z1))

Iesniegt ekrānšāviņus un pirmkodu.


 

 

10.5. Implementēt objektu atpazīšanu COCO kopā, izmantojot YOLO

Balstoties uz video instrukcijām implementēt objektu atpazīšanu COCO kopā, izmantojot YOLO. Sagatave:

https://share.yellowrobot.xyz/quick/2025-4-21-539AB043-B601-4C2F-93A5-01D682274BC5.zip

image-20250421102406209

Iesniegt ekrānšāviņus un pirmkodu.


 

10.6. Mājasdarbs. Implementēt DICE kļūdu un IoU metriku

 

  1. Pie iepriekšēja uzdevuma pievienot DICE kļūdu - izmantot kā kļūdas funkciju (izveidot kompozīta kļūdas funkciju ar koeficientiem katrai kļūdas funkcijas daļai)

     

    (86)LDICE=12iyiyiiyi+iyi

     

  2. Pievienot metriku IoU, attēlot to grafiski (Jaccard index)

    (87)IoU=1NTP+ϵTP+FP+FN+ϵ=1Nyy+ϵyy+(1y)y+y(1y)+ϵ

     

  3. Papildus uzdevums - Implementēt UNet++ arhitektūru: https://arxiv.org/pdf/1807.10165

Apmācīt modeļus ar jaunajām izmaiņām iesniegt ekrānšāviņus un pirmkodu.