In the last chapter practice, we looked at FDA Recall Data and practiced making sense of data we have never seen before. In this chapter, we are set to work with an Excel file called CFPB_Truncated_Complaints.
Let’s start with the context because having a general understanding of what we will find in the data will allow us to process it faster.
Answer the following questions:
- What does the file name reveal? What is CFPB? What do you know of its work?
> Google or Bing the CFPB, read up on what they do, read up on recent news articles related to them. - What kinds of complaints do you expect to find in a file from the CFPB?
> Consider the context and develop a general idea. - Next, open the data file. What is the source? Have there been any edits made to it?
> Note how the data file has been downsized to a specific year, quarter. What does it suggest of the total file size that a quarter alone makes up so many records? - Is there a data dictionary? Do the field names make sense without one? What can you do if there is no data dictionary?
> Note the descriptive field names and the data types in each column. If you need a data dictionary, add a table to the Info sheet in the workbook. - How much data is in the workbook? How many fields? How many records? How many cells with contents?
> Highlight columns, rows, the entire range with data. You’ll find 13 columns/field names, 7608 rows/records, 92081 cells with contents. Some are dates, some are text, some are numbers. - What types of questions can you answer with the data?
> How many complaints are there per category? How many complaints over time altogether? What is the average complaint per category? How many complaints arrive via fax? - What functions do you know in Excel that will help you process data?
> Use all your skills from CTRL+F to Excel tables to PivotTables to answer the questions below.
EXERCISES
Answer the following questions using a variety of features and functions in Excel. A variety of solutions are shown in the section below. Try to answer these questions independently and check your answers against other
- What date was complaint 2845603 received?
- What product is the complaint with the ID number of 2770577 is regarding?
- Which Product has the most complaints? Which has the least?
- Which sub-product has the most complaints? The least?
- How many complaints are there about virtual currency?
- The highest volume of complaints is submitted via… The lowest?
- In the state of Texas, which Company is the most complaints about?
- What does the following output reveal?
Solutions:
- Complaint 2845603 was received 3/16/2018.
> Scroll OR CTRL+F OR filter an Excel table. - Complaint ID number 2770577 was regarding a Mortgage product.
> Scroll OR CTRL+F OR filter an Excel table. - Credit reporting, credit repair services, or other personal consumer reports has the most complaints. Money transfer, virtual currency, or money service has the least.
> Use Subtotals in a range OR create a PivotTable. - The Credit reporting sub-product has the most complaints. There are 5 products with one complaint each.
> Use Subtotals in a range OR create a PivotTable. - There are 48 complaints about virtual currency.
> Filter an Excel table OR search your PivotTable from the last step OR filter a new PivotTable. - The highest volume of complaints is submitted via Web. The lowest via Postal mail.
> Use Subtotals in a range OR create a PivotTable. - In the state of Texas, EQUIFAX, INC. is the company with the most complaints. There are several companies with only one complaint.
> Filter an Excel table OR filter a PivotTable. - The output shows the complaints about JPMorgan Chase in the state of Texas.
> The pivot table filters the number of complaints for each Products by State and Company.
Now, with 100% more charts!
Let’s build on our knowledge from Chapter 4 about charts, graphs, and data visualization to create even more powerful summary data to tell a story based on our numbers. The PivotChart below summarizes over 7,600 pieces of data and it depends on Excel’s ability to condense all of those numbers into a simple visual feature in a way that a standard chart couldn’t.
If we want to show only a subset of our data, we can filter from the PivotTable end as shown below:
Alternatively, we can filter from the PivotChart side as well by using the Filter inside the chart area and unchecking items we don’t wish to include. We also use the Labels to add values and show percentages to compare the relative volume of complaints about each product that met the criteria for the number filter.
Can you write a summary sentence based on the chart below?
Media Attributions
- CFPB_Output
- PivotChart
- Q3ChartFilterA
- Q3ChartFilterB
- Q3ChartMoreThan250