DataViz Makeover 2

This DataViz Makeover will critique and improve 2 visualizations which portray the results of a study conducted to understand the willingness of the public to take COVID19 vaccinations

Author

Affiliation

Andre Lee

Published

Feb. 18, 2021

DOI

Given Visualization

a)

Clarity

s/n	Comment	Suggestion
1	The legend title ‘Vac1’ is meaningless.	Rename title ‘Vac1’ to ‘Response’.
2	The ‘% of strongly agreed to vaccination’ chart is contained within the ‘Which country is more pro-vaccine’ chart. It can be reproduced by merely changing the way the latter chart is presented.	Do away with the ‘% of strongly agreed to vaccination’ chart as it does not concern anything new on top of the first chart.
3	Continuing from comment 2) above, it also isn’t obvious from the titles of both charts whether they are referring to the same survey question. The chart on the left has the title ‘Which country is more pro-vaccine’ while the chart on the right has the title ‘% of strongly agreed to vaccination’. On inspection and comparison of the bars, it is apparent that they are visualizations of the same question, but the titles alone are not clear about that.	If 2 charts are referring to the same survey question, their charts should be clear about it.
4	The visualization only shows the responses by proportion of the countries’ respondents but does not provide insight into the absolute number of respondents. If the sample size for a particular country is small, then the results for that country may not be properly representative.	It will help to add information on the absolute number of respondents and possibly add statistical measures like error bars.
5	Legend does not explain explicitly what responses 2, 3, 4 represent.	Label 2 as Slightly Agree, 3 as Neutral, 4 as Slightly Disagree
6	x-axis labelling unclear. For example, ‘% of Total Record’ is unclear and could mean % of all respondents to someone who doesn’t know the background of the data.	Rename ‘% of Total Record’ as ‘% of each country’s respondents’.

Aesthetics

s/n	Comment	Suggestion
7	For ‘Which country is more pro-vaccine?’ chart, the responses are portrayed as ‘5’ on the left to ‘1’ on the right. This is the inverse way in which the legend is portrayed, which isn’t the most visually intuitive.	Portray the responses as 1 on the left to 5 on the right.
8	The x-axis of the left chart is in whole numbers while the x-axis of the right chart is to 1 decimal place. There is no need for there to be 1 decimal place precision as the x-axis intervals are already so wide.	Make both x-axis of the same whole number precision.
9	It is aesthetically odd for the ‘united kingdom’ bar to go beyond the maximum value of the x-axis.	Make all of the chart values to stay within the maximum bound of axes.
10	If x-axis already shows units is %, no need to label each tick with %.	Remove % from the x-axis tick marks.

b)

Sketch of proposed design

Advantages of proposed design

s/n	Advantage
1	The Likert Scale is good to visualize attitude or belief items such as the willingness to receive the COvID19 vaccine in this study. By centering around 0, it is visually intuitive at one glance to see if respondents of a country lean more towards agreeing or disagreeing.
2	By allowing for filtering of different profiles such as gender, age group, household size bracket etc, the user is able to derive more pointed insights rather than a static view of the entire population
3	By showing the Error Bars in the Dot Plot, one is able to be visually informed how certain one can be of the study results based on the data of each group of respondents. It provides visual information regarding the variability of the data and is useful as predictors of the range of new samples.
4	Allowing for the user to select from a list of questions regarding public perception of the vaccine in different contexts allows for a variety of insights regarding the issue, rather than just from one perspective as provided in the original visualization.

c)

Click for link to Tableau Public post

d)

For this makeover, the visualization comes in the form of 2 charts: 1) Likert Scale and 2) Dot Plot with Error Bars.

We go through the Data Source Preparation steps first before going through the steps to build the 2 charts proper.

Data Source Preparation

Download the data for the relevant countries from https://github.com/YouGov-Data/covid-19-tracker/tree/master/data
Open ‘australia.csv’ as the first Data Source.
Remove it as shown in the screengrab below.

Create a New Union comprising the 14 tables for the 14 countries

Fig 2: New Union to join 14 country tables

Create new aliases for the countries under ‘Table Name’ field.

Rename ‘Table Name’ field to ‘Country’.
Create aliases for all the survey questions’ responses (vac_1, vac_2, vac2_1, vac2_2, vac2_3, vac2_6, vac_3) as shown below. This is to facilitate building calculated Field later on where there is inequality in the formula based on the survey responses.

Inspect the data. On inspection, it seems that the Sweden does not have any data under the union as the ‘RecordNo’ field which all the other files are joined under is saved as ‘record’ instead. Rectify this in the raw ‘sweden.csv’ file and redo the union, checking that this is fixed and Sweden data is successfully joined this time.

Creating new parameters and fields to build the Likert Scale

Hide the unneeded fields from the data which we will not be using for the visualization. We retain the following fields:

Create new Calculated Field called ‘Age Group’ as follows:

Fig 7: Create new calculated field ‘Age Group’

The ages are binned accordingly so that viewers can view results by meaningful distinct age groups.

Create new Calculated Field called ‘Household Size Bracket’ as follows:

Fig 8: Create new calculated field ‘Household Size Bracket’

The household sizes are binned accordingly so that viewers can view results by meaningful household size brackets

Create new Calculated Field called ‘Number of Children’ as follows:

Fig 9: Create new calculated field ‘Number of Children’

The number of children in each household is binned accordingly so that viewers can view results by meaningful brackets of the number of children in each household.

Create a new Parameter called ‘Select Question’. This is in view of creating a Dropdown Filter for the viewer to select the relevant question he/she would like to view the visualization results for.

Fig 10: Create new parameter ‘Select Question’

Create Calculated Field called vac1_reversescore as follows:

Fig 11: Create new calculated field ‘vac1_reversescore’

This reverses the order of the scores where 5-Strongly Disagree now has a score of 1, while 1-Strongly Agree has a score of 5. This is for the construction of the Likert Scale where responses below 3 (Neutral) are for negative responses. Create the same fields for all the other survey questions (vac_2, vac2_1, vac2_2, vac2_3, vac2_6, vac_3).

Make sure to label in the Response Legend that 1 corresponds to Strongly Disagree, 2 to Disagree, 3 to Neutral, 4 to Agree and 5 as Strongly Agree.

Create new Calculated Field called ‘Selected Question’ as follows:

Fig 12: Create new calculated field ‘Selected Question’

Create Calculated Field called ‘Number of Records’ =1 as follows: This is used as a dummy to add up Number of Records.

Fig 13: Create new calculated field ‘Number of Records’

Create Calculated Field called ‘Total Count’ as follows:

Fig 14: Create new calculated field ‘Total Count’

Create Calculated Field called ‘Count Negative’ as follows:

Fig 15: Create new calculated field ‘Count Negative’

Create Calculated Field called ‘Total Count Negative’ as follows:

Fig 16: Create new calculated field ‘Total Count Negative’

Create Calculated Field called ‘Percentage’ as follows:

Fig 17: Create new calculated field ‘Percentage’

Create calculated Field called ‘Gantt Start’ as follows:

Fig 18: Create new calculated field ’Gantt Start

Create Calculated Field called ‘Gantt Percent’ as follows:

Fig 19: Create new calculated field ‘Gantt Percent’

Building the Likert Scale

Drag Gantt Percent to Columns
Drag Country to Rows. Chart will first look as follows:

Drag Response to Detail under the Marks tab
Under Gantt Percent, select Compute using Response as follows:

Change Chart Type to Gantt Bar under Marks
Drag Response to Colour under the Marks tab
Drag Percentage to Size under the Marks tab. Chart will now look as follows:
Drag Score into Filters tab and uncheck Null so Null responses are disregarded throughout.

Show the Parameter ‘Select Question’. This shows up as a Dropdown Bar.
Change the Chart Title to be dynamic according to the question selected as follows:

Drag Age Group into Filters tab:

Show Age Group as a Multiple Values Dropdown list

Drag ‘gender’ into Filters tab:

Show gender as a Multiple values list

Drag employment_status into Filters tab:

Include Null as an allowed value although it accounts for an insignificant number of responses. User can uncheck it to phase it out if desired.

Show employment_status as a Multiple Values Dropdown list

Drag ‘Household Size Bracket’ into Filters tab:

Show Household Size Bracket as a Multiple Values Dropdown list

Drag ‘Number of Children’ into Filters tab:

Show Number of Children as a Multiple Values Dropdown list

Likert Scale is as follows, before any formatting:

Creating new parameters and fields to build the Dot Plot with Error Bars

We next build Dot Plot with Error Bars. For each set of filters selected by the user, this Dot Plot will show the Proportion of Respondents from each Country with a ‘Strongly Agree’ response, accompanied by Error Bars showing the 95% Confidence Intervals. This is an improvement from the original ‘% of strongly agreed to vaccination’ chart, as it not only allows for filtering of profiles but also shows the confidence intervals based on the sample size of each country.

Create Calculated Field called Strongly Agree Count as follows:

Fig 26: Create calculated field ‘Strongly Agree Count’

This is to mark each Strongly Agree response as 1 count. Create the same fields for the other 4 responses.

Create Calculated Field called ‘Proportion of Respondents_Strongly Agree’ as follows:

Fig 27: Create calculated field ’Proportion of respondents _Strongly Agree’

This is to count the proportion of respondents which selected Strongly Agree out of the total number of records. Create the same fields for the other 4 responses.

Create Calculated Field called ‘Proportion Standard Error_Strongly Agree’ as follows:

Fig 28: Create calculated field ‘Proportion Standard Error_Strongly Agree’

This is to calculate the standard error. Create the same fields for the other 4 responses.

Create Calculated Field called Z_95% as follows:

This is the critical z-score for 95% confidence level

Create Calculated Field called ’Upper 95%_Strongly Agree’ as follows:

Fig 30: create calculated field ’Upper 95%_Strongly Agree’

This is to calculate the upper 95% confidence level for the proportion of respondents who answered Strongly Agree. Create the same fields for the other 4 responses.

Create Calculated Field called ’Lower 95%_Strongly Agree’ as follows:

Fig 31: Create calculated field ’Lower 95%_Strongly Agree’

This is to calculate the lower 95% confidence level for the proportion of respondents who answered Strongly Agree. Create the same fields for the other 4 responses.

Create Calculated Field ‘Lower 95% Level’ as follows:

Fig 32: Created calculated field ‘Lower 95% Level’

This is to generate the correct lower 95% level according to the type of response which the user wishes to see the lower 95% level for.

Create Calculated Field ‘Upper 95% Level’ as follows:

Fig 33; Create calculated field ‘Upper 95% Level’

This is to generate the correct upper 95% level according to the type of response which the user wishes to see the upper 95% level for.

Create Parameter ‘Select Type of Response’. This is to select the type of response for which we wish to see the Dot Plot with Error Bars for.

Fig 34: create parameter ‘Select Type of Response’

Create Calculated Field called ‘Select Response Type’. This is to display the correct proportion of respondents’ responses for the response type which the user selects in step 7 above.

Fig 35: Create calculated field ‘Select Response Type’

Building the Dot Plot with Error Bars

Drag ‘Select Response Type’ to Columns
Drag Country to Rows
Change Marks type to Circle. At this point chart looks as follows:

Drag Measure values to the ‘top x-axis’ and then exclude everything except the Upper 95% Level and Lower 95% Level

Click on Measure values in the Marks tab and change Marks type to Line. This generates a squiggly line across the chart as follows:

Fig 38: Chart looks like a squiggly line initially

Drag Measure Names to Path to generate the line which represents the 95% confidence interval.

Right click x-axis and click Synchronize Axis

Format the colour of the Dot to make it different from the colour of the Error Bars

Compile both Likert Scale and Dot Plot with Error Bars into a single dashboard for integration and incorporate user-friendly interface

Create a new Dashboard
Drag sheets ‘Likert’ and ‘Final Error Bars’ into the dashboard side by side. It will look as follows:

To synchronize the order of the countries on both charts, go to the Likert sheet, right click on the y-axis and select Sort. Sort by Field, where the Field Name is ‘Select Response type’ in Ascending order. For example, if user selects to view the Dot Plot for ‘1 - Strongly Disagree’, both charts will be sorted with the country on the top having the least proportion of ‘Strongly Disagree’ and the country on the bottom having the highest proportion of ‘Strongly Disagree’.

Fig 41: Sort order of countries by select Response Type field

Move the Select Question dropdown list to the top. This is so that the entire question can be shown and not truncated.
Move the select ‘Type of Response’ dropdown list below ‘Select Question’
Drag all the profile filters (gender, Age group, Household Size Bracket, Number of children and employment status) to the bottom of the dashboard.
Create a text box to direct the user to filter to view results based on different profile combinations.

Fig 42: Create text box to direct user to filter by profiles

Capitalize first letter of each profile filter group to tidy the look.
go to the Final Error Bars sheet. To show the number of records for each country for that particular question, drag ‘Number of Records’ into the All tab under Marks as follows:

Then edit the tooltip under the same All tab as above:

Change the title for the Dot Plot chart to as follows:

Add Dashboard Title ‘A study to understand the willingness and perception of receiving the COVID19 vaccine across different countries’

Final product:

e)

Three major observations revealed by the data visualisation prepared

When asked if they would get the vaccine if it becomes available in 2021, France is the country most against it with 34% of respondents indicating they strongly disagree. On the other hand, South Koreans are least against it, with only 6% indicating they strongly disagree. It is also interesting that the 3 countries with least proportion of ‘Strongly Disagree’ are Asian countries, whereas the top 3 countries with highest proportion of ‘Strongly Disagree’ are all Western countries. This could be due to cultural perceptions of vaccines.

Fig 46: Comparing countries’ proportion of residents who Strongly Disagree to receiving COVID19 vaccine in 2021

Females are significantly more worried than males about the potential side effects of a vaccine. Taking 30% as a reference point, in 11 out of the 14 countries, more than 30% of females indicated strongly agree. On the contrary, only in 2 out of the 14 countries did more than 30% of males indicate strongly agree.

This is in line with scientific evidence that females are generally more risk-averse than males.

Fig 47: In 11 out of 14 countries, more than 30% of Females strongly agree that they are worried about potential side effects of the vaccine

Fig 48: In only 3 out of 14 countries, more than 30% of Males strongly agree that they are worried about potential side effects of the vaccine

Generally the elderly are more worried than the young about getting COVID19. Taking the elderly as those of above 60 years old, and the young as those up to 30 years old. Using 20% as a reference point, in 11 out of 14 countries, more than 20% of the elderly indicated strongly agreed that they are worried about getting COVID19. On the contrary, only in 4 out of the 14 countries did more than 20% of youth strongly agreed that they are worried about getting COVID19.

This is probably due to the fact that there has been published statistical evidence that the elderly are more susceptible to being more adversely affected and have a higher death rate when hit by COVID19 as compared to the young, who have a much higher and faster recovery rate.