Monday, November 17, 2014

Using Pivot Table to summarize large numerical data in Microsoft Excel

Sometimes, you have large amounts of data in a spreadsheet. The data are related in such a way that separate cells are linked through a corresponding attribute, which appears in an adjacent cell or within the same row. Take for example the table here:

Let's say you want to have a smaller summary table that tells you how much capacity does each of the three grids (Luzon, Visayas, and Mindanao) have. If this is in a spreadsheet application like Microsoft Excel or Google Sheets, you can use Sort to sort the table by the grid column, then insert a row at the last entry for each group. In the corresponding column, you can use the function SUM. You can then create another sheet in the same file so that you can create a table with five rows and two columns. First row is header, last row is totals. It might look like this:

GridDependable Capacity
Luzon
Visayas
Mindanao
Total

However, what if you want to have another way of summarizing it? For example, you want to have a column that says, for each grid, show how much capacity is for Fossil, and how much is for Renewable? Something like this:

FossilRenewableSub-total
Luzon------
Visayas------
Mindanao------
Sub-total------

The task becomes a little bit harder. Of course, this can become even more difficult if you want to report per energy resource, sub-categories of Fossil and Renewable, like this:

FossilRenewableTotal
CoalOilNatural GasFossil Sub-totalHydroGeothermalSolarWindRenewable Sub-total
Luzon--------------------
Visayas--------------------
Mindanao--------------------
Sub-total--------------------

Of course, we are arranging this based on how you think is the logical way of presenting the data. What if you presented it to your supervisor, who says you should present it this way instead:

LuzonVisayasMindanaoSub-total
FossilCoal--------
Oil--------
Natural Gas--------
Fossil Sub-total--------
RenewableHydro--------
Geothermal--------
Solar--------
Wind--------
Renewable Sub-total--------
GRAND TOTAL--------

While theoretically, it should not be all that difficult, considering it is just a matter of re-organizing the data that were already calculated, there is actually a feature in spreadsheet applications that allows you to do automatic summarizing of numerical data (in sums and averages, among others) and arrange them by certain identifiable categories. It is called the Pivot Table tool.

With the advent of mobile version of applications, such as Android app of Google's Sheets, Pivot Table is one of those features that have yet to be made available to the mobile version. I guess it is a bit too sophisticated to be done in a mobile operating system. Nonetheless, in Google Sheets, if you have created it already in the web (desktop) version, you can make changes to the source sheet, and it will be updated in the pivot table sheet.

Pivot Table is a tool in spreadsheet applications (available in Microsoft Excel, LibreOffice Calc, and Google Sheets. It allows you to summarize large sets of numerical data by using columns that have repeating or grouping data. In the table, it refers to the columns on "Grid," "Source Type," "Clean or Not Clean," and "Source."

Before using Pivot Table


Pivot Table will use the values in the first row of the worksheet as labels of the output summary table (called the Pivot Table Report), so you must make sure that the data in the worksheet source are appropriately labeled.

Also, you have to ensure that the data is "clean," meaning you are sure that each cell contains the data that should be there. Your source table must be organized in such a way that each row is one record. We will discuss entity and data cleaning in the future, but for now, just ensure that one row corresponds to one record that is unique. Among others, this means that the data in that cell corresponds to that record (row) in that kind of variable (label or column).

Also, if you intend to make mathematical calculations, make sure that all numbers are formatted as numbers. If you have a cell wherein you manually inserted a comma, for example, your spreadsheet application will convert that number into text. If you need to use a comma for thousands and millions places, you can do this by using the cell formatting option of your spreadsheet program.

In this post, we will discuss how to do that in Excel. I will have separate posts for Calc and Google Sheets.

With Microsoft Excel


If you are using Microsoft Excel 2010, you can create a pivot table for your data by going to the Insert tab (Just make sure your cursor is in a cell within the table you want to summarize.), and then click Pivot Table.



You will be asked where you will source your data and where you want to put the Pivot Table Report. It should look like this:


Assuming you choose "New Worksheet," (and I recommend this, so that you can change the arrangement of the source table without affecting the output Pivot Table Report), and after you click "OK," you will go to the new sheet, with the column labels on the right side of the screen. Put a check on the labels that you want to include in your report.




As you see above, merely checking the labels will just add the labels in the "Row Labels" field. You have to move the labels to where you want it to display. In our example, we want each Grid to have a separate column, so we will move "Grid" to the "Column Labels" field, like so (You can drag and drop the label):

(In Excel, it can automatically detect that you want to add the values in the "Dependable Capacity" column, because the contents of that label are numerical values. However, you can also tell your spreadsheet application to summarize by using Average, or Count, Maximum, or Minimum, among others, depending on your spreadsheet application.)


If you want to refine your output by removing some specific values (e.g., you want to remove power plants that are in Mindanao), you can click the downward arrow. It will display a menu with finer controls.


Microsoft Excel allows you to produce a Pivot Table Report using multiple sheets, but I think that is rarely used in teaching. In case you want to know how to do that, you can read the official support article here.

Use of Pivot Table


Aside from summarizing, Pivot Table allows you to see possible relationships among the variables in your data because they are summarized. While the apparent relationships may not have the same confidence level as using statistical tools, such as Spearman Rho correlation, it can give you an idea which variables to test.

Wanna give it a try? You can test your knowledge by downloading clean data from The World Bank's Open Data initiative, available here.

Read how to use pivot table in LibreOffice Calc
Read how to use pivot table in Google Sheets

No comments:

Post a Comment