2023-Q3-AI 2. Numpy, Pandas, SkLearn, Matplotlib

 

2.1. Video / Materials

Video (18 Jul 2023, 13:00): https://youtube.com/live/Cc_aZssiFJM?feature=share

Jamboard: https://jamboard.google.com/d/1dxl7j39_SLdIReZVqMuugwQuPPy_vK_uZy_m8HutUXY/edit?usp=sharing

 


Jamboard ir iedotas edit tiesības Lekciju nepieciešams ierakstīt ar OBS, screen recording, Youtube live streaming key: tas8-sfkq-v75j-5sap-d50b (⚠️ cita stream key)

Par katru uzdevumu dodam 100 punktus


 

2.2. Analayze and clean

Use time series dataset of ETFs (stock trading): http://share.yellowrobot.xyz/quick/2021-12-11-BC6610D5-07C5-4F7E-9586-896D143D9302.csv

Template: http://share.yellowrobot.xyz/quick/2023-7-17-5307E9E6-349E-4EA8-A916-C4C18A6B46B2.zip

Manipulate dataset using Pandas:

  1. Filter only those symbols that have range of data available daily in between 01/10/2021 - 01/11/2021

  2. Sort symbols by volitility of price in whole time period (standard deviation of close price), Filter top 10 most volatile symbols

  3. Group by fund_symbol

  4. Sort by price_date

  5. Normalize price with MinMax scaler for each symbol separately (0..1)

Visualize dataset using Matplotlib:

  1. Draw as line graph ovelayed top 10 most volatile symbols close price Y and date on X (Legend with symbol names)

  2. Draw as subplots in 2 columns and 5 rows all of these symbols and close price Y / date X

2.3. Analyze and clean Titanic dataset

Use Titanic dataset with missing pieces of information: http://share.yellowrobot.xyz/quick/2023-7-16-9126FCE9-F954-49EE-BF79-56B319560E0C.csv Template: http://share.yellowrobot.xyz/quick/2023-7-17-862B8127-0A42-4B9A-B633-AFC0E5732B11.zip

Manipulate dataset using Pandas:

  1. Load using pandas

  2. With argparse change replacement of missing data: drop missing values, place median or mean values in missing places

  3. Encode categorical data as one-hot-encoded data

Visualize dataset using Matplotlib:

  1. Draw histograms with all features (Pclass, Survived, Sex, Age, SibSp, Parch, Fare, Embarked)

  2. Reduce dimensionality of all inputs using PCA sklearn (categorical data can be used, because it is concated as one-hot encoded) and display in 2D, color by survival category


 

2.4. Homework - Clean emotion data, clusterize it and visualize.

Data set: http://share.yellowrobot.xyz/1644411423-olympiad-2022/fer_features_dataset_incomplete.csv

Python template: http://share.yellowrobot.xyz/1644411423-olympiad-2022/3_template.py.zip

The data set contains emotion labels and a set of vectors characterizing eyebrows, eyes, lips, and teeth (if present). Some data do not have emotion labels available, but these can be used to improve model accuracy. Similarly, some data do not have tooth vectors available because the image may not show teeth. The original image size from which these vectors are obtained is 256x256 pixels.

A program needs to be developed that takes a CSV file with data as input and saves the missing labels in the same file as output. The test file will not contain any labels. The model is allowed to use the training file and other additional files. Submit the task as a ZIP file, which contains the source code and instructions on how to use it.

Clusterize data using k-means algorithm, reduce dimensions to 2D using t-SNE (both algorithms can be implemented using SkLearn).

Additionally you can implement classification or clusterization using SkLearn SVM or some other model to predict label

 

Submit source code and screenshot of results. Visualize results using matplotlib.

 

Below example of clusterization using lip vectors for positive and neutral emotions.

image-20220211005915042

Below example of visualization of emotion vectors.

Neitrāls (neutral)Prieks (happiness)Bēdas (sadness)Dusmas (anger)
16sadness_53anger_101

Below structure of data record when CSV loaded as dict.

 

 


 

Materials

 

Filter rows (negation)

 

Access rows

 

Apply

 

 

Priekš Data science ļoti useful

summary functions: https://medium.com/analytics-vidhya/how-to-summarize-data-with-pandas-2c9edffafbaf

 

Sniplets:

https://jeffdelaney.me/blog/useful-snippets-in-pandas/

https://gist.github.com/bsweger/e5817488d161f37dcbd2

pandas_cheats

 

Open CSV

b603907f-d381-433d-b01e-ceef65ac7492

 

Edit CSV pandas

edit_csv_pandas

 

explore_pandas

cell

cond

sort

func

vals

gr