2025-Q1-AI-EN 9. Exam Handout

1. Task

Choose one correct answer.

  1. Which statement is correct?

    1. Artificial intelligence nowadays is mostly a complex computer program that mainly consists of programming rules

    2. Artificial intelligence nowadays is mostly a mathematical model that mainly consists of mathematical equations

    3. Artificial intelligence nowadays is mostly a complex computer program that mainly consists of expert knowledge

  2. What does artificial intelligence learn from?

    1. Rules created by experts

    2. Data

    3. Rules created by a programmer

  3. Which of the following examples could be input data in an artificial intelligence model?

    1. The probability that a client will stop using the service

    2. How many times a client has logged into the system in the last 10 days

    3. The model’s weight values

  4. Which of the following examples could be output data in an artificial intelligence model?

    1. The probability that a client will stop using the service

    2. How many times a client has logged into the system in the last 10 days

    3. The model’s weight values

  5. To predict a product's price, what type of model is needed?

    1. Regression

    2. Classification

    3. Enumeration

  6. To predict whether a client will churn from subscribtion the service, what type of model is needed?

    1. Regression

    2. Classification

    3. Enumeration

  7. In which environment is artificial intelligence usually trained?

    1. Matlab

    2. Python

    3. Power BI

  8. What data are needed to train a model that could be used in production?

    1. Training set

    2. Test set

    3. Validation set

    4. Training, Test, Validation sets (Train, Test, Validation)

  9. Which factor most affects the model's accuracy?

    1. Learning rate

    2. Unbalanced sample count in each class in the training dataset

    3. Sample variety in the dataset

  10. For which application would artificial intelligence not be effective?

    1. Writing text advertisements

    2. Password and username verification when logging into websites

    3. Creating coloring books for children

    4. Music composition

  11. How similar is the artificial deep neural network model to the human natural neural network model?

    1. Almost identical, as evidenced by large language models, image models, and other models

    2. Very similar, because it models biochemical processes as activations are executed

    3. Not similar, because the artificial neural network model is mathematical and executes differently from the human natural neural network

  12. Which sequence of actions corresponds to training deep neural network models?

    1. Data normalization, splitting data into sets, model creation, loss function selection, additional metric selection, test cycle, validation cycle, epochs, training cycle, backpropagation

    2. Data normalization, splitting data into sets, model creation, epochs, training cycle, backpropagation, loss function selection, additional metric selection, test cycle, validation cycle

    3. Data normalization, splitting data into sets, model creation, loss function selection, additional metric selection, epochs, training cycle, backpropagation, test cycle, validation cycle

  13. What does an epoch mean in the training process of artificial neural networks?

    1. All samples in the training set are considered and there can be many epochs in one training process

    2. A data normalization method that removes extreme values

    3. All samples in the training set are considered and there can be only one epoch in the training process

    4. The validation samples are considered after training

  14. If the numerical value of the MSE loss function is 0.5, then after one training epoch the numerical value will most likely be:

    1. 0.6

    2. 0.5

    3. 0.4

  15. RNN is usually used to:

    1. Recognize several objects in an image

    2. Predict stock prices from market data

    3. Predict car prices from an advertisement

  16. A ConvNet without data augmentation during training is capable of recognizing:

    1. Objects moved within the image

    2. Objects moved and rotated in the image

    3. Objects moved, enlarged, and rotated in the image

  17. The weights W of a pre-trained GRU at each time step:

    1. are different

    2. are the same

    3. are not specified

  18. What is the dot product of matrices?

    1. A mathematical operation that yields a perpendicular vector or matrix between input vectors

    2. A mathematical operation that performs matrix transformation using multiplication in any dimensions

    3. An algorithm that uses addition and multiplication in the last 2 dimensions in any matrices

  19. What is a Linear layer or function in artificial neural networks?

    1. The vector product of a matrix

    2. The scalar multiplication of a matrix and a bias by addition

    3. The linear regression algorithm

  20. Why is batch normalization needed before the activation function?

    1. To prevent overfitting

    2. To prevent dead neurons

    3. To prevent bias towards one class in predictions

  21. Which statement best captures the core goal of an auto-encoder in unsupervised learning?

    1. Map input data directly to class labels using labeled examples.

    2. Learn a compressed internal representation that can faithfully reconstruct the original input.

    3. Separate data into linearly separable clusters by maximizing class margins.

    4. Generate adversarial perturbations that fool a downstream classifier.

  22. Why is batch normalization needed before the activation function?

    1. To prevent overfitting

    2. To prevent dead neurons

    3. To prevent bias towards one class in predictions

  23. Which statement best describes how the latent z-dimension in an auto-encoder compares with the components obtained from Principal Component Analysis (PCA) when used for dimensionality reduction?

    1. The z-dimension must always equal the original input size, whereas PCA can use any smaller number of components.

    2. PCA components are strictly linear projections, while the z-dimension of an auto-encoder can capture nonlinear relationships determined by the network architecture.

    3. PCA requires iterative back-propagation to learn its components, but an auto-encoder derives its z-dimension analytically in closed form.

    4. Increasing the number of PCA components inevitably increases reconstruction error, whereas enlarging the z-dimension in an auto-encoder always increases it.

  24. Which task is best handled by a Convolutional Neural Network?

    1. Predicting the next word in a sentence

    2. Classifying handwritten digits in grayscale images

    3. Forecasting daily stock prices from a time series

    4. Recommending movies based on user-item ratings

  25. For which application is a Recurrent Neural Network the most appropriate choice?

    1. Sorting a batch of images by dominant color

    2. Detecting sentiment in a stream of tweets

    3. Mapping fixed-length feature vectors to output classes with no temporal order

    4. Segmenting objects in high-resolution satellite photos

  26. A plain feed-forward neural network (no weight sharing or recurrence) is generally the most suitable for:

    1. Predicting pixel values in a 2-D image grid

    2. Translating an English sentence into French word-by-word

    3. Mapping a set of tabular patient features to a disease probability

    4. Generating the next frame of a video sequence

  27. Why can accuracy be misleading on a highly imbalanced classification dataset?

    1. It penalizes false positives more than false negatives

    2. A model that predicts only the majority class can still achieve high accuracy

    3. Accuracy ignores true negatives entirely

    4. Accuracy is not affected by class distribution at all

  28. In a binary confusion matrix, which cell counts samples that were actually positive and predicted positive?

    1. True Positive

    2. False Positive

    3. True Negative

    4. False Negative

  29. Precision is computed with which formula?

    1. TP / (TP + FN)

    2. TP / (TP + FP)

    3. TN / (TN + FP)

    4. (TP + TN) / (TP + TN + FP + FN)

  30. A learning rate that is too large in SGD most often causes:

    1. Slow convergence but stable training

    2. Over-regularization of the model

    3. Oscillation or divergence of the loss

    4. Vanishing gradients in early layers

  31. Which mathematical expression best describes the ReLU activation?

    1. max(x,0)

    2. tanh(x)

    3. 11+ex

    4. sign(x)

  32. Why must any useful hidden-layer activation be non-linear?

    1. Non-linearity allows weight decay to work.

    2. Without it, stacked layers collapse to a single linear transform.

    3. It guarantees convex loss surfaces.

    4. GPUs require non-linear math for speed.

  33. A dataset contains a binary “Customer_Type” column with values {“Retail”, “Wholesale”}. The simplest correct preprocessing step is:

    1. Scale to [-1, 1].

    2. One-hot encode into two columns.

    3. Embed into a 128-dimensional vector.

    4. Drop the column because only two categories exist.

  34. Which statement about category embeddings is false?

    1. They are learned jointly with the rest of the network during back-propagation.

    2. They can capture similarity (e.g., “France” closer to “Germany” than to “Australia”).

    3. Their dimensionality must equal the number of unique categories.

    4. They allow the network to generalise to unseen combinations of categorical values with numeric features.

  35. Why is a softmax layer typically added to the end of a multi-class classifier?

    1. To ensure each logit stays within the range [0, 1] before training starts

    2. To map raw logits into a normalized probability distribution that sums to 1

    3. To prevent exploding gradients during back-propagation

    4. To speed up matrix multiplications in the final layer

  36. In categorical cross-entropy, the target labels provided to the loss function should be

    1. Raw integer class indices (e.g., 0, 1, 2)

    2. One-hot encoded vectors or probability distributions

    3. Random noise to encourage regularization

    4. Log-scaled class frequencies

  37. Your regression targets are contaminated with rare but very large sensor glitches. You need a loss that minimally amplifies those extreme errors.

    1. L2 (mean-squared error)

    2. L1 (mean-absolute error)

    3. Huber loss with a small δ

    4. Binary cross-entropy

  38. Which situation most clearly calls for supervised learning?

    1. You have 60 000 chest-X-ray images each labeled “normal” or “pneumonia,” and you want to build a model that diagnoses new scans.

    2. You have 10 000 unlabeled chest-X-ray images and want to discover visual groupings automatically.

    3. You wish to compress a 512-dimensional sensor signal into 32 features.

    4. You want to remove noise from speech recordings without any paired clean examples.

  39. A bank wants to spot previously unseen types of fraudulent transactions in a huge unlabeled dataset. Which is the most appropriate first step?

    1. Train a labeled classifier on past fraud cases

    2. Apply clustering or anomaly-detection (unsupervised) methods

    3. Use a value-function estimator (reinforcement learning)

    4. Fit a regression model to transaction amounts

  40. Which statement about supervised vs. unsupervised learning is true?

    1. Supervised models never require validation data.

    2. Unsupervised learning cannot be evaluated quantitatively.

    3. Supervised learning needs labeled targets; unsupervised does not.

    4. Unsupervised learning always outperforms supervised learning on classification tasks.

     

2. Task

List and describe all necessary steps if you were to train a cat and dog photo classifier using PyTorch. You have been given 800 photos of cats and 200 photos of dogs. You cannot use a pre-trained model. You need to create, train, and deploy the model in production, where it will be used to distinguish between photos of cats and dogs. If you mention keywords like “model” “loss function” etc., you should use the exact name and description for each “model” “loss function” etc., that you will use for this task.

Looking for exact methods for:

  1. Dataset pre-processing before training

  2. Dataset pre-processing during training

  3. Measuring performance of model

  4. Model architecture and why?

  5. Loss function and why?

  6. Optimizer and why?

  7. Training protocol, what to measure, when to stop

  8. How to use model in infrence