Video (18 Jul 2023, 13:00): https://youtube.com/live/Cc_aZssiFJM?feature=share
Jamboard: https://jamboard.google.com/d/1dxl7j39_SLdIReZVqMuugwQuPPy_vK_uZy_m8HutUXY/edit?usp=sharing
Jamboard ir iedotas edit tiesības Lekciju nepieciešams ierakstīt ar OBS, screen recording, Youtube live streaming key: tas8-sfkq-v75j-5sap-d50b (⚠️ cita stream key)
Par katru uzdevumu dodam 100 punktus
Use time series dataset of ETFs (stock trading): http://share.yellowrobot.xyz/quick/2021-12-11-BC6610D5-07C5-4F7E-9586-896D143D9302.csv
Template: http://share.yellowrobot.xyz/quick/2023-7-17-5307E9E6-349E-4EA8-A916-C4C18A6B46B2.zip
Manipulate dataset using Pandas:
Filter only those symbols that have range of data available daily in between 01/10/2021 - 01/11/2021
Sort symbols by volitility of price in whole time period (standard deviation of close price), Filter top 10 most volatile symbols
Group by fund_symbol
Sort by price_date
Normalize price with MinMax scaler for each symbol separately (0..1)
Visualize dataset using Matplotlib:
Draw as line graph ovelayed top 10 most volatile symbols close price Y and date on X (Legend with symbol names)
Draw as subplots in 2 columns and 5 rows all of these symbols and close price Y / date X
Use Titanic dataset with missing pieces of information: http://share.yellowrobot.xyz/quick/2023-7-16-9126FCE9-F954-49EE-BF79-56B319560E0C.csv Template: http://share.yellowrobot.xyz/quick/2023-7-17-862B8127-0A42-4B9A-B633-AFC0E5732B11.zip
Manipulate dataset using Pandas:
Load using pandas
With argparse change replacement of missing data: drop missing values, place median or mean values in missing places
Encode categorical data as one-hot-encoded data
Visualize dataset using Matplotlib:
Draw histograms with all features (Pclass, Survived, Sex, Age, SibSp, Parch, Fare, Embarked)
Reduce dimensionality of all inputs using PCA sklearn (categorical data can be used, because it is concated as one-hot encoded) and display in 2D, color by survival category
Data set: http://share.yellowrobot.xyz/1644411423-olympiad-2022/fer_features_dataset_incomplete.csv
Python template: http://share.yellowrobot.xyz/1644411423-olympiad-2022/3_template.py.zip
The data set contains emotion labels and a set of vectors characterizing eyebrows, eyes, lips, and teeth (if present). Some data do not have emotion labels available, but these can be used to improve model accuracy. Similarly, some data do not have tooth vectors available because the image may not show teeth. The original image size from which these vectors are obtained is 256x256 pixels.
A program needs to be developed that takes a CSV file with data as input and saves the missing labels in the same file as output. The test file will not contain any labels. The model is allowed to use the training file and other additional files. Submit the task as a ZIP file, which contains the source code and instructions on how to use it.
Clusterize data using k-means algorithm, reduce dimensions to 2D using t-SNE (both algorithms can be implemented using SkLearn).
Additionally you can implement classification or clusterization using SkLearn SVM or some other model to predict label
Submit source code and screenshot of results. Visualize results using matplotlib.
Below example of clusterization using lip vectors for positive and neutral emotions.
Below example of visualization of emotion vectors.
Neitrāls (neutral) | Prieks (happiness) | Bēdas (sadness) | Dusmas (anger) |
---|---|---|---|
![]() | ![]() | ![]() | ![]() |
Below structure of data record when CSV loaded as dict.
xxxxxxxxxx
731record = {
2 'label': labels[Y],
3
4 'eye_a_h_x1': X[0][0][0],
5 'eye_a_h_y1': X[0][0][1],
6 'eye_a_h_x2': X[0][1][0],
7 'eye_a_h_y2': X[0][1][1],
8
9 'eye_a_v_x1': X[1][0][0],
10 'eye_a_v_y1': X[1][0][1],
11 'eye_a_v_x2': X[1][1][0],
12 'eye_a_v_y2': X[1][1][1],
13
14 'eye_b_h_x1': X[2][0][0],
15 'eye_b_h_y1': X[2][0][1],
16 'eye_b_h_x2': X[2][1][0],
17 'eye_b_h_y2': X[2][1][1],
18
19 'eye_b_v_x1': X[3][0][0],
20 'eye_b_v_y1': X[3][0][1],
21 'eye_b_v_x2': X[3][1][0],
22 'eye_b_v_y2': X[3][1][1],
23
24 'brow_a_h_x1': X[4][0][0],
25 'brow_a_h_y1': X[4][0][1],
26 'brow_a_h_x2': X[4][1][0],
27 'brow_a_h_y2': X[4][1][1],
28
29 'brow_a_v_x1': X[5][0][0],
30 'brow_a_v_y1': X[5][0][1],
31 'brow_a_v_x2': X[5][1][0],
32 'brow_a_v_y2': X[5][1][1],
33
34 'brow_b_h_x1': X[6][0][0],
35 'brow_b_h_y1': X[6][0][1],
36 'brow_b_h_x2': X[6][1][0],
37 'brow_b_h_y2': X[6][1][1],
38
39 'brow_b_v_x1': X[7][0][0],
40 'brow_b_v_y1': X[7][0][1],
41 'brow_b_v_x2': X[7][1][0],
42 'brow_b_v_y2': X[7][1][1],
43
44 'lips_a_h_x1': X[8][0][0],
45 'lips_a_h_y1': X[8][0][1],
46 'lips_a_h_x2': X[8][1][0],
47 'lips_a_h_y2': X[8][1][1],
48
49 'lips_a_v_x1': X[9][0][0],
50 'lips_a_v_y1': X[9][0][1],
51 'lips_a_v_x2': X[9][1][0],
52 'lips_a_v_y2': X[9][1][1],
53
54 'lips_b_h_x1': X[10][0][0],
55 'lips_b_h_y1': X[10][0][1],
56 'lips_b_h_x2': X[10][1][0],
57 'lips_b_h_y2': X[10][1][1],
58
59 'lips_b_v_x1': X[11][0][0],
60 'lips_b_v_y1': X[11][0][1],
61 'lips_b_v_x2': X[11][1][0],
62 'lips_b_v_y2': X[11][1][1],
63
64 'teeth_h_x1': X[12][0][0] if len(X) > 12 else '',
65 'teeth_h_y1': X[12][0][1] if len(X) > 12 else '',
66 'teeth_h_x2': X[12][1][0] if len(X) > 12 else '',
67 'teeth_h_y2': X[12][1][1] if len(X) > 12 else '',
68
69 'teeth_v_x1': X[13][0][0] if len(X) > 12 else '',
70 'teeth_v_y1': X[13][0][1] if len(X) > 12 else '',
71 'teeth_v_x2': X[13][1][0] if len(X) > 12 else '',
72 'teeth_v_y2': X[13][1][1] if len(X) > 12 else '',
73}
xxxxxxxxxx
161 df_full = pd.read_csv('./datasets/housing_train.csv')
2
3
4 df_full = df_full.select_dtypes(include=[np.number]).dropna()
5
6 df_y = df_full.iloc[:,'SalePrice']
7
8 df_y = df_full.ix[:,'SalePrice']
9 df_x = df_full.ix[:,['YearBuilt', 'OverallQual’]]
10
11 df_x = df_full.ix[:100, ‘Col’]
12 df_x = df_full.ix[:100] # whole rows
13
14 df_y = df_full.loc['SalePrice']
15
16 np_dataset_x = df_x.values
Filter rows (negation)
xxxxxxxxxx
11 df_threshold = df_threshold[~((df_threshold.ground_truth == 'neutral') & (df_threshold.confidence_pred == 'neutral|noise'))]
Access rows
xxxxxxxxxx
151 df.loc[df['column_name'] == some_value]
2
3 To select rows whose column value is in an iterable, some_values, use isin:
4
5 df.loc[df['column_name'].isin(some_values)]
6 Combine multiple conditions with &:
7
8 df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
9 Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses
10
11 df['column_name'] >= A & df['column_name'] <= B
12 is parsed as
13
14 df['column_name'] >= (A & df['column_name']) <= B
15
Apply
xxxxxxxxxx
11 df["Grades"].apply(lambda val: "Yes" if val < 45 else "No")
Priekš Data science ļoti useful
xxxxxxxxxx
81 pd.options.display.max_columns = 999
2 pd.options.display.max_rows = 999
3
4 print(df["classes"].value_counts())
5 # .to_dict()
6
7 print(df_full.describe())
8 print(df["selling_price"].describe())
summary functions: https://medium.com/analytics-vidhya/how-to-summarize-data-with-pandas-2c9edffafbaf
Sniplets:
https://jeffdelaney.me/blog/useful-snippets-in-pandas/
https://gist.github.com/bsweger/e5817488d161f37dcbd2
Open CSV
Edit CSV pandas