Monday, November 17, 2014

Using Pivot Table in Calc

Sometimes, you have large amounts of data in a spreadsheet. The data are related in such a way that separate cells are linked through a corresponding attribute, which appears in an adjacent cell or within the same row. Take for example the table here:

Let's say you want to have a smaller summary table that tells you how much capacity does each of the three grids (Luzon, Visayas, and Mindanao) have. If this is in a spreadsheet application like Microsoft Excel or Google Sheets, you can use Sort to sort the table by the grid column, then insert a row at the last entry for each group. In the corresponding column, you can use the function SUM. You can then create another sheet in the same file so that you can create a table with five rows and two columns. First row is header, last row is totals. It might look like this:

GridDependable Capacity
Luzon
Visayas
Mindanao
Total

However, what if you want to have another way of summarizing it? For example, you want to have a column that says, for each grid, show how much capacity is for Fossil, and how much is for Renewable? Something like this:

FossilRenewableSub-total
Luzon------
Visayas------
Mindanao------
Sub-total------

The task becomes a little bit harder. Of course, this can become even more difficult if you want to report per energy resource, sub-categories of Fossil and Renewable, like this:

FossilRenewableTotal
CoalOilNatural GasFossil Sub-totalHydroGeothermalSolarWindRenewable Sub-total
Luzon--------------------
Visayas--------------------
Mindanao--------------------
Sub-total--------------------

Of course, we are arranging this based on how you think is the logical way of presenting the data. What if you presented it to your supervisor, who says you should present it this way instead:

LuzonVisayasMindanaoSub-total
FossilCoal--------
Oil--------
Natural Gas--------
Fossil Sub-total--------
RenewableHydro--------
Geothermal--------
Solar--------
Wind--------
Renewable Sub-total--------
GRAND TOTAL--------

While theoretically, it should not be all that difficult, considering it is just a matter of re-organizing the data that were already calculated, there is actually a feature in spreadsheet applications that allows you to do automatic summarizing of numerical data (in sums and averages, among others) and arrange them by certain identifiable categories. It is called the Pivot Table tool.

With the advent of mobile version of applications, such as Android app of Google's Sheets, Pivot Table is one of those features that have yet to be made available to the mobile version. I guess it is a bit too sophisticated to be done in a mobile operating system. Nonetheless, in Google Sheets, if you have created it already in the web (desktop) version, you can make changes to the source sheet, and it will be updated in the pivot table sheet.

Pivot Table is a tool in spreadsheet applications (available in Microsoft Excel, LibreOffice Calc, and Google Sheets. It allows you to summarize large sets of numerical data by using columns that have repeating or grouping data. In the table, it refers to the columns on "Grid," "Source Type," "Clean or Not Clean," and "Source."

Before using Pivot Table


Pivot Table will use the values in the first row of the worksheet as labels of the output summary table (called the Pivot Table Report), so you must make sure that the data in the worksheet source are appropriately labeled.

Also, you have to ensure that the data is "clean," meaning you are sure that each cell contains the data that should be there. Your source table must be organized in such a way that each row is one record. We will discuss entity and data cleaning in the future, but for now, just ensure that one row corresponds to one record that is unique. Among others, this means that the data in that cell corresponds to that record (row) in that kind of variable (label or column).

Also, if you intend to make mathematical calculations, make sure that all numbers are formatted as numbers. If you have a cell wherein you manually inserted a comma, for example, your spreadsheet application will convert that number into text. If you need to use a comma for thousands and millions places, you can do this by using the cell formatting option of your spreadsheet program.

In this post, we will use LibreOffice or OpenOffice.Org Calc.

With LibreOffice Calc

For LibreOffice Calc, you can access the Pivot Table function by going to Data menu and selecting Pivot Table, like below. To streamline the process, put the cursor in the table where you know you will source the Pivot Table Report.


Like in Excel, Calc will ask where you want to get the data from:


Calc will then show you where you can place your labels. You can drag and drop the labels.

(In Calc, you have to make sure that you select the appropriate label in the "Data Fields." This is where the values will be summarized, which maybe summation, counting, maximum, or minimum. To change what type of operation you want to do, click the column label in Data Fields, choose, and then click "OK.")


 If you have multiple level labels (e.g., in rows, you will display first the Type of Source, then Source), you have to arrange it like with the higher level positioned first. Notice below that in the "Row Fields," the label "Source Type" is positioned first.


Your Pivot Table Report should look like this:


Like in Excel, if you want to refine the Report, just click on the downward arrows. You can also tell Calc to include only certain records based on certain value parameters, like include only records where the Dependable Capacity is greater than 10. To do that, just click the A1 Cell, and it will open up following Filter dialog box:


Unfortunately, if you make any changes in the source table, it won't automatically update the Pivot Table Report. But it can easily be updated. Just right-click any cell within the Report, and select "Refresh."

Using that process, you can also see the "Edit Layout." That will allow you to change the placement of labels, in case your supervisor wants to re-arrange the columns.

Click here for more references about LibreOffice Calc's Pivot Table.

On how to do this with Google Sheets, click here.

Use of Pivot Table


Aside from summarizing, Pivot Table allows you to see possible relationships among the variables in your data because they are summarized. While the apparent relationships may not have the same confidence level as using statistical tools, such as Spearman Rho correlation, it can give you an idea which variables to test.

Wanna give it a try? You can test your knowledge by downloading clean data from The World Bank's Open Data initiative, available here.

No comments:

Post a Comment