2023-Q3-AI 3. Linear Regression, Backpropogation - Numpy

3.1. Video / Materials

Video (19 Jul 2023, 10:00): https://youtu.be/cpzNledmu2E

Jamboard: https://jamboard.google.com/d/1r9VCyZpV5kPotlclSJh-v3OJ4kHM7xo0vQwmHorbwN4/edit?usp=sharing

Materials: https://www.youtube.com/playlist?list=PL0-GT3co4r2wlh6UHTUeQsrf3mlS2lk6x

https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

http://mathproofs.blogspot.com/2006/07/dot-product-and-cosine.html

http://152.67.89.169/1629722468-deep-learning-theory/basic%20maths.pdf

Ir iedota pieeja jamboard un ar OBS jāveic screen streaming uz šādu setting

Youtube live key: 90sd-edfr-j2jk-e6jy-cm51 rtmp://a.rtmp.youtube.com/live2

Par katru uzdevumu dodam 100 punktus

Iepriekšējo gadu video: https://youtu.be/Nih4r7pmFBA

Jamboard: https://jamboard.google.com/d/1dsF6Jpal0_ql_NekddUARfH2HeJRz0Oul4d-GmEsnz4/edit?usp=sharing

Saturs

Lai iedotu intuīciju no sākuma var parādīt http://playground.tensorflow.org
Izskaidrot uzdevumu, ka no vairākiem auto sludinājumiem (nobraukums, vietu skaits, dzinēja tilpums, ātrumu skaits) mēŗkis ir prognozēt auto cenu un izlauduna gadu
Izmantojam tikai skalārus inputs: 4 (nobraukums, vietu skaits, dzinēja tilpums, ātrumu skaits) Prognozējam tikai skalārus outputs: 2 (uto cenu un izlauduna gadu) Uzreiz apmācam ar 10 paraugiem reizē! Studenti totāli nesaprot matricu reizinājumus, ja sākotnēji strādājam ar katru paraugu atsevišķi - labāk pat nepiedāvāt tādu iespēju! Bet, ja trāpās slikta grupa, ieteicams tomēr sākt ar aprēķiniem pa vienam paraugam
Sākotnēji izveidojam modeli BEZ apmācības un izskaidrojam kļūdas funkciju - izmantojam random svarus
Obligāti pirms apmācīšanas standartizēt inputs un outputs (jo citādi primitīvi modeļi ļoti slikti apmācās)
Pastāstam par parciālo atvasinājumu jēgu un atpakaļizplatīšanās algoritmu
$\frac{\mathcal{L}}{\partial b_1}$ $\frac{\mathcal{L}}{\partial W_2}$ lekcijas laikā paprasam iesūtīt risinājumu - var ar roku uz papīra un nofotogrāfēt vai ar latex. Neparādi atvasinājumus b_1 un W_2 uzreiz, tikai tad, ja paši no W_1 un b_2 piemēriem netiek galā, tad parādi tos
Visbeidzot kopīgi izveidojam python kodu, kurš strādā
Mājās tiek iedots alternatīvs cits modelis un cita datu kopa (vēlams), mājās varētu arī iedot implementēt batches

Pastāstīt par kļūdas funkciajs ietekmi uz gradient descent - MAE vs MSE

3.2. Implement the model from Jamboard without weight training

Implement the model from Jamboard without weight training (TODOS). Submit the code and screenshot with the best results.

Data: http://share.yellowrobot.xyz/upic/9107b8c805a5cb4e4b44572bd2e7e43e_1675358285.jpg

Model: http://share.yellowrobot.xyz/upic/0ecb8948a61e5024063ca9811d4a09e2_1675358277.jpg

Template:

http://share.yellowrobot.xyz/quick/2023-2-2-6F8B72ED-ECF2-4E9D-A73C-7C7E950F467E.zip

3.3. Implement derivatives and model training

Implement derivatives and model training with SGD

Template: http://share.yellowrobot.xyz/quick/2023-2-2-928FB6AD-BE82-4213-AF13-89D008CDE031.zip

Iesniegt kodu un screenshot ar labākajiem rezultātiem

3.4. Homework - Implement a new model

You are only allowed to use numpy and based on the preparation of task 3.3. implement the model: http://share.yellowrobot.xyz/upic/3340e3c11b330f49d79eb2c4f8c72426_1675358207.jpg
Use MSE instead of MAE as the error function
Add an additional feature, mileage, to the X data set, for example, a car with the year of production 2002 and mileage 300k will be a data sample [2.0, 3.0], but a car with the year of production 2011 and mileage 75k will be a data sample [11.0, 0.75]. Choose 4 data samples yourself and predict their price using several input data values.
The model should use matrix weights instead of scalar value weights, for example, W_1.shape = (2, 8)
Train the model, submit the code and screenshot from the loss plot.

$tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\\ Model(x, W_1, b_1, W_2, b_2, W_3, b_3) = \\ Linear(tanh(Linear(tanh(Linear(x, W_1, b_1)), W_2, b_2)), W_3, b_3) = y'$

\begin{matrix} (2) & \begin{matrix} L_{M A E} = \sum | y^{'} - y | \\ L_{M S E} = \sum (y^{'} - y)^{2} \end{matrix} \end{matrix}

\begin{matrix} (3) & \begin{matrix} \frac{L_{M A E} (y^{'}, y)}{\partial b_{1}} = ? \\ \frac{L_{M A E} (y^{'}, y)}{\partial b_{2}} = ? \\ b_{1}^{'} = b_{1} - \frac{L_{M A E} (y^{'}, y)}{\partial b_{1}} \cdot α \\ b_{2}^{'} = b_{2} - \frac{L_{M A E} (y^{'}, y)}{\partial b_{2}} \cdot α \end{matrix} \end{matrix}

MAE derivative

$\mathcal{L}_{MAE} = \sum |y'-y|$

$\mathcal{L}_{MAE} = |a| = \sqrt{a^2} = (a^2)^{\frac{1}{2}}$

$\frac{\mathcal{L}_{MAE}}{\partial y'} = ?$

$\frac{1}{n} = n^{-1}$

$\frac{1}{n^{10}} = n^{-10}$

$\frac{\mathcal{L}_{MAE}}{\partial a} =\\ \frac{1}{2}(a^2)^{\frac{1}{2}-1} \cdot \frac{a^2}{\partial a} =\\ \frac{1}{2}(a^2)^{\frac{1}{2}-1} \cdot 2a =\\ a \cdot (a^2)^{-\frac{1}{2}} =\\ \frac{a}{(a^2)^{\frac{1}{2}}} =\\ \frac{a}{\sqrt{(a^2)}} =\\ \frac{a}{|a| + \epsilon}$

Tanh

\begin{matrix} (4) & \begin{matrix} t a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}} \\ \frac{t a n h (x)}{\partial x} = 1 - t a n h (x)^{2} \end{matrix} \end{matrix}

Linear function

$Linear(x, W, b) = W \cdot x + b$

$\frac{Linear(x, W, b)}{\partial W} = x$

$\frac{Linear(x, W, b)}{\partial x} = W$

$\frac{Linear(x, W, b)}{\partial b} = 1 \cdot b^0 = 1$

Sigmoid function

$\sigma(x) = \frac{1}{1 + e^{-x}}$

$\frac{\sigma(x)}{\partial x} = \frac{1}{1 + e^{-x}} = (1+e^{-x})^{-1} =\\ \frac{(1+e^{-x})}{\partial x} \cdot -1(1+e^{-x})^{-2}=\\ e^{-x}\cdot \frac{-x}{\partial x} \cdot -1(1+e^{-x})^{-2} =\\ \frac{e^{-x}}{(1+ e^{-x})^2} = \sigma(x) (1 - \sigma(x))$

$dx\; \frac{1}{f(x)} = f(x)^{-1} = -f(x)^{-2} \cdot \frac{f(x)}{dx}$

$\frac{e^{f(x)}}{dx} = e^{f(x)} \cdot \frac{f(x)}{dx}$

https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e

Model

$y' = Linear(\sigma(Linear(x, W_1, b_1)), W_2, b_2) \\ W_2 \cdot \sigma((Linear(x, W_1, b_1)) + b_2$

Huber loss

https://www.wolframalpha.com/input?i2d=true&i=D%5B%5C%2840%29Power%5Bv%2C2%5D%5C%2840%29Sqrt%5B1+%2B+Power%5B%5C%2840%29Divide%5By+-+z%2Cv%5D%5C%2841%29%2C2%5D%5D-1%5C%2841%29%5C%2841%29%2Cz%5D

\begin{matrix} (5) & \begin{matrix} L_{H u b e r} = \sum δ^{2} (\sqrt{1 + (\frac{y - \hat{y}}{δ})^{2}} - 1) \\ \frac{L_{H u b e r}}{\partial \hat{y}} = δ^{2} (1 + (\frac{y - \hat{y}}{δ})^{2})^{\frac{1}{2}} = \frac{δ^{2} (1 + (\frac{y - \hat{y}}{δ})^{2})^{\frac{1}{2}}}{\partial (1 + (\frac{y - \hat{y}}{δ})^{2})} \cdot \frac{1 + (\frac{y - \hat{y}}{δ})^{2}}{\partial (\hat{y})} = \frac{δ^{2} (1 + (\frac{y - \hat{y}}{δ})^{2})^{\frac{1}{2}}}{\partial (1 + (\frac{y - \hat{y}}{δ})^{2})} \cdot \frac{(\frac{y - \hat{y}}{δ})^{2}}{\partial (\frac{y - \hat{y}}{δ})} \cdot \frac{\frac{y - \hat{y}}{δ}}{\partial \hat{y}} = \\ 0.5 δ^{2} (1 + (\frac{y - \hat{y}}{δ})^{2})^{- \frac{1}{2}} \cdot 2 (\frac{y - \hat{y}}{δ}) \cdot - \frac{1}{δ} = \\ \frac{\hat{y} - y}{\sqrt{1 + (\frac{y - \hat{y}}{δ})^{2}}} \end{matrix} \end{matrix}

Model #2

\begin{matrix} (6) & \begin{matrix} \hat{y} = m o d e l (x, W_{1}, b_{1}, W_{2}, b_{2}, W_{3}, b_{3}) = \\ L i n e a r (T a n h (L i n e a r (T a n h (L i n e a r (x, W_{1}, b_{1})), W_{2}, b_{2})), W_{3}, b_{3}) \end{matrix} \end{matrix}