2025-Q1-AI-EN 9. Exam Handout

1. Task

Choose one correct answer.

Which statement is correct?
1. Artificial intelligence nowadays is mostly a complex computer program that mainly consists of programming rules
2. Artificial intelligence nowadays is mostly a mathematical model that mainly consists of mathematical equations
3. Artificial intelligence nowadays is mostly a complex computer program that mainly consists of expert knowledge
What does artificial intelligence learn from?
1. Rules created by experts
2. Data
3. Rules created by a programmer
Which of the following examples could be input data in an artificial intelligence model?
1. The probability that a client will stop using the service
2. How many times a client has logged into the system in the last 10 days
3. The model’s weight values
Which of the following examples could be output data in an artificial intelligence model?
1. The probability that a client will stop using the service
2. How many times a client has logged into the system in the last 10 days
3. The model’s weight values
To predict a product's price, what type of model is needed?
1. Regression
2. Classification
3. Enumeration
To predict whether a client will churn from subscribtion the service, what type of model is needed?
1. Regression
2. Classification
3. Enumeration
In which environment is artificial intelligence usually trained?
1. Matlab
2. Python
3. Power BI
What data are needed to train a model that could be used in production?
1. Training set
2. Test set
3. Validation set
4. Training, Test, Validation sets (Train, Test, Validation)
Which factor most affects the model's accuracy?
1. Learning rate
2. Unbalanced sample count in each class in the training dataset
3. Sample variety in the dataset
For which application would artificial intelligence not be effective?
1. Writing text advertisements
2. Password and username verification when logging into websites
3. Creating coloring books for children
4. Music composition
How similar is the artificial deep neural network model to the human natural neural network model?
1. Almost identical, as evidenced by large language models, image models, and other models
2. Very similar, because it models biochemical processes as activations are executed
3. Not similar, because the artificial neural network model is mathematical and executes differently from the human natural neural network
Which sequence of actions corresponds to training deep neural network models?
1. Data normalization, splitting data into sets, model creation, loss function selection, additional metric selection, test cycle, validation cycle, epochs, training cycle, backpropagation
2. Data normalization, splitting data into sets, model creation, epochs, training cycle, backpropagation, loss function selection, additional metric selection, test cycle, validation cycle
3. Data normalization, splitting data into sets, model creation, loss function selection, additional metric selection, epochs, training cycle, backpropagation, test cycle, validation cycle
What does an epoch mean in the training process of artificial neural networks?
1. All samples in the training set are considered and there can be many epochs in one training process
2. A data normalization method that removes extreme values
3. All samples in the training set are considered and there can be only one epoch in the training process
4. The validation samples are considered after training
If the numerical value of the MSE loss function is 0.5, then after one training epoch the numerical value will most likely be:
1. 0.6
2. 0.5
3. 0.4
RNN is usually used to:
1. Recognize several objects in an image
2. Predict stock prices from market data
3. Predict car prices from an advertisement
A ConvNet without data augmentation during training is capable of recognizing:
1. Objects moved within the image
2. Objects moved and rotated in the image
3. Objects moved, enlarged, and rotated in the image
The weights W of a pre-trained GRU at each time step:
1. are different
2. are the same
3. are not specified
What is the dot product of matrices?
1. A mathematical operation that yields a perpendicular vector or matrix between input vectors
2. A mathematical operation that performs matrix transformation using multiplication in any dimensions
3. An algorithm that uses addition and multiplication in the last 2 dimensions in any matrices
What is a Linear layer or function in artificial neural networks?
1. The vector product of a matrix
2. The scalar multiplication of a matrix and a bias by addition
3. The linear regression algorithm
Why is batch normalization needed before the activation function?
1. To prevent overfitting
2. To prevent dead neurons
3. To prevent bias towards one class in predictions
Which statement best captures the core goal of an auto-encoder in unsupervised learning?
1. Map input data directly to class labels using labeled examples.
2. Learn a compressed internal representation that can faithfully reconstruct the original input.
3. Separate data into linearly separable clusters by maximizing class margins.
4. Generate adversarial perturbations that fool a downstream classifier.
Why is batch normalization needed before the activation function?
1. To prevent overfitting
2. To prevent dead neurons
3. To prevent bias towards one class in predictions
Which statement best describes how the latent z-dimension in an auto-encoder compares with the components obtained from Principal Component Analysis (PCA) when used for dimensionality reduction?
1. The z-dimension must always equal the original input size, whereas PCA can use any smaller number of components.
2. PCA components are strictly linear projections, while the z-dimension of an auto-encoder can capture nonlinear relationships determined by the network architecture.
3. PCA requires iterative back-propagation to learn its components, but an auto-encoder derives its z-dimension analytically in closed form.
4. Increasing the number of PCA components inevitably increases reconstruction error, whereas enlarging the z-dimension in an auto-encoder always increases it.
Which task is best handled by a Convolutional Neural Network?
1. Predicting the next word in a sentence
2. Classifying handwritten digits in grayscale images
3. Forecasting daily stock prices from a time series
4. Recommending movies based on user-item ratings
For which application is a Recurrent Neural Network the most appropriate choice?
1. Sorting a batch of images by dominant color
2. Detecting sentiment in a stream of tweets
3. Mapping fixed-length feature vectors to output classes with no temporal order
4. Segmenting objects in high-resolution satellite photos
A plain feed-forward neural network (no weight sharing or recurrence) is generally the most suitable for:
1. Predicting pixel values in a 2-D image grid
2. Translating an English sentence into French word-by-word
3. Mapping a set of tabular patient features to a disease probability
4. Generating the next frame of a video sequence
Why can accuracy be misleading on a highly imbalanced classification dataset?
1. It penalizes false positives more than false negatives
2. A model that predicts only the majority class can still achieve high accuracy
3. Accuracy ignores true negatives entirely
4. Accuracy is not affected by class distribution at all
In a binary confusion matrix, which cell counts samples that were actually positive and predicted positive?
1. True Positive
2. False Positive
3. True Negative
4. False Negative
Precision is computed with which formula?
1. TP / (TP + FN)
2. TP / (TP + FP)
3. TN / (TN + FP)
4. (TP + TN) / (TP + TN + FP + FN)
A learning rate that is too large in SGD most often causes:
1. Slow convergence but stable training
2. Over-regularization of the model
3. Oscillation or divergence of the loss
4. Vanishing gradients in early layers
Which mathematical expression best describes the ReLU activation?
1. $\max(x,0)$
2. $\tanh(x)$
3. $\frac{1}{1+e^{-x}}$
4. $\text{sign}(x)$
Why must any useful hidden-layer activation be non-linear?
1. Non-linearity allows weight decay to work.
2. Without it, stacked layers collapse to a single linear transform.
3. It guarantees convex loss surfaces.
4. GPUs require non-linear math for speed.
A dataset contains a binary “Customer_Type” column with values {“Retail”, “Wholesale”}. The simplest correct preprocessing step is:
1. Scale to [-1, 1].
2. One-hot encode into two columns.
3. Embed into a 128-dimensional vector.
4. Drop the column because only two categories exist.
Which statement about category embeddings is false?
1. They are learned jointly with the rest of the network during back-propagation.
2. They can capture similarity (e.g., “France” closer to “Germany” than to “Australia”).
3. Their dimensionality must equal the number of unique categories.
4. They allow the network to generalise to unseen combinations of categorical values with numeric features.
Why is a softmax layer typically added to the end of a multi-class classifier?
1. To ensure each logit stays within the range [0, 1] before training starts
2. To map raw logits into a normalized probability distribution that sums to 1
3. To prevent exploding gradients during back-propagation
4. To speed up matrix multiplications in the final layer
In categorical cross-entropy, the target labels provided to the loss function should be
1. Raw integer class indices (e.g., 0, 1, 2)
2. One-hot encoded vectors or probability distributions
3. Random noise to encourage regularization
4. Log-scaled class frequencies
Your regression targets are contaminated with rare but very large sensor glitches. You need a loss that minimally amplifies those extreme errors.
1. L2 (mean-squared error)
2. L1 (mean-absolute error)
3. Huber loss with a small δ
4. Binary cross-entropy
Which situation most clearly calls for supervised learning?
1. You have 60 000 chest-X-ray images each labeled “normal” or “pneumonia,” and you want to build a model that diagnoses new scans.
2. You have 10 000 unlabeled chest-X-ray images and want to discover visual groupings automatically.
3. You wish to compress a 512-dimensional sensor signal into 32 features.
4. You want to remove noise from speech recordings without any paired clean examples.
A bank wants to spot previously unseen types of fraudulent transactions in a huge unlabeled dataset. Which is the most appropriate first step?
1. Train a labeled classifier on past fraud cases
2. Apply clustering or anomaly-detection (unsupervised) methods
3. Use a value-function estimator (reinforcement learning)
4. Fit a regression model to transaction amounts
Which statement about supervised vs. unsupervised learning is true?
1. Supervised models never require validation data.
2. Unsupervised learning cannot be evaluated quantitatively.
3. Supervised learning needs labeled targets; unsupervised does not.
4. Unsupervised learning always outperforms supervised learning on classification tasks.

2. Task

List and describe all necessary steps if you were to train a cat and dog photo classifier using PyTorch. You have been given 800 photos of cats and 200 photos of dogs. You cannot use a pre-trained model. You need to create, train, and deploy the model in production, where it will be used to distinguish between photos of cats and dogs. If you mention keywords like “model” “loss function” etc., you should use the exact name and description for each “model” “loss function” etc., that you will use for this task.

Looking for exact methods for:

Dataset pre-processing before training
Dataset pre-processing during training
Measuring performance of model
Model architecture and why?
Loss function and why?
Optimizer and why?
Training protocol, what to measure, when to stop
How to use model in infrence