This quiz consists of 0 mandatory questions where you can gather 0 points.
You will complete the quiz by answering all the questions and gathering at least 0 points.
Answered: 0 / 0
Achieved points: 0 (NaN %)
Monday: Quiz
Exploratory data analysis and clustering of socioeconomic data
This quiz is a part of the accompanying material for training in data science and machine learning for Houston's STEM teachers.
Quiz
The following questions are related to the Human Development Index (HDI) dataset. Solve them in Orange. Start by loading the dataset using the Datasets widget.
How many data instances (countries) are included in the HDI data set? (1pt.)
Which of the following is NOT a meta feature in the HDI data set? (1pt.)
In which country does schooling take the longest? (1pt.)
In the HDI data set, find the three countries with the highest life expectancy. One of the following countries is NOT one of them. Which one is it? (1pt.)
If we compare average years of schooling with life expectancy, we can see an outlier. Which is the country with a relatively low life expectancy despite high average years of schooling? (1pt.)
Use Data Table, Distributions, or Select Rows to select the countries according to some feature-related criteria.
Select the countries with an HDI below 0.5. Then use the Box Plot widget to find the feature on which these countries differ most from the rest of the world. Which of the following features comes out on top (or close to it)? (1pt.)
To answer the following set of questions, start by creating a clustering of countries using the normalized Euclidean distances in the Distances widget, and use the Ward linkage for hierarchical clustering.
Divide the countries into three clusters. Call the clusters 'developed,' 'less developed,' and 'underdeveloped.' In which cluster do we find Cuba? (1pt.)
What feature (from those included in the list below) most distinguishes the three clusters? (1pt.)
Hint: to display correlations with just one selected variable, use the Correlations widget and instead of (All Combinations) select those with the variable in question.
The variable that is most strongly correlated (in terms of the absolute value of the Pearson's correlation) with the correct answer to the previous question is 'dependency ratio young age'. What is the second most highly correlated variable? (1pt.)
In the dendrogram, locate the cluster of ten countries that includes North Africa, several Middle Eastern countries, and Iran. On which variable do these countries differ most from the rest of the world? (If the name is too long and part of it is cut off, hover the mouse over it and see the tooltip.) (1pt.)
Choropleth Map should be very helpful in answering this question. First, feed the selected data from Hierarchical Clustering into the Geocoding widget and use Choropleth Map to display the location of the countries on the world map.
In a clustering into three groups, consider the most developed cluster. It consists of two sub-clusters, which we could call 'Western' and 'Eastern'. Which of the following groups appears in the 'wrong' subcluster? (1pt.)
Hint: to select two clusters, click on one and then shift-click on the other. The clusters are labeled C1 and C2 in the dendrogram, and then this labeling is retained in the data set on the output of the Hierarchical Clustering widget.