161df_full = pd.read_csv('./datasets/housing_train.csv')
2
3
4df_full = df_full.select_dtypes(include=[np.number]).dropna()
5
6df_y = df_full.iloc[:,'SalePrice']
7
8df_y = df_full.ix[:,'SalePrice']
9df_x = df_full.ix[:,['YearBuilt', 'OverallQual’]]
10
11df_x = df_full.ix[:100, ‘Col’]
12df_x = df_full.ix[:100] # whole rows
13
14df_y = df_full.loc['SalePrice']
15
16np_dataset_x = df_x.values
Filter rows (negation)
11df_threshold = df_threshold[~((df_threshold.ground_truth == 'neutral') & (df_threshold.confidence_pred == 'neutral|noise'))]
Access rows
151df.loc[df['column_name'] == some_value]
2
3To select rows whose column value is in an iterable, some_values, use isin:
4
5df.loc[df['column_name'].isin(some_values)]
6Combine multiple conditions with &:
7
8df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
9Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses
10
11df['column_name'] >= A & df['column_name'] <= B
12is parsed as
13
14df['column_name'] >= (A & df['column_name']) <= B
15
Apply
11df["Grades"].apply(lambda val: "Yes" if val < 45 else "No")
Priekš Data science ļoti useful
81pd.options.display.max_columns = 999
2pd.options.display.max_rows = 999
3
4print(df["classes"].value_counts())
5# .to_dict()
6
7print(df_full.describe())
8print(df["selling_price"].describe())
summary functions: https://medium.com/analytics-vidhya/how-to-summarize-data-with-pandas-2c9edffafbaf
Sniplets:
https://jeffdelaney.me/blog/useful-snippets-in-pandas/
https://gist.github.com/bsweger/e5817488d161f37dcbd2
Open CSV
Edit CSV pandas