Sunday, August 30, 2015

Cloud Computing Now

During the presentation of our Information System Strategic Plan, I was asked about what system I was using for our Performance Information System (v.1). The question was raised as an item in discussion was about the Knowledge Warehouse, which was supposed to make all our separate information systems and databases inter-operable (i.e., data can be shared with one another easily, and these separate systems can can “talk” to one another). Without going into the details of the system I am developing, I said that it is Google-based, and so, cloud-based.

There were different reactions to the idea of the system being cloud-based. While there were expressions that making something cloud-based made data available, most of the following expressions were on security and where the data will be actually stored. Obviously, I cannot expect top-level managers and executives (including the oversight official for our information systems) to understand the concept of cloud computing in a precise manner (cloud computing as a concept has slightly differences depending on the source, I must admit). Hence, I thought I have to update my knowledge about this concept, just in case somebody asks me.

I wrote briefly about cloud computing before here. Back then, cloud computing in the Philippines was not that popular. Actually, even globally. Cloud computing as a term only became popular then, with people promoting it emphasizing economy and allowing the business to focus on its core functions rather than get bogged down with setting up their IT requirements (which big companies usually require of their supply chain partners). Those against it emphasize (just like in the paragraph before) security and location of data. Since then (about five years past), a lot of developments have happened. The Wikipedia article is a testament to that.

On September 10, 2009, the Wikipedia article on cloud computing described the concept as a “paradigm of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.” If you would access the same article today (August 30, 2015), it says that “Cloud computing is a model for enabling ubiquitous network access to a shared pool of configurable computing resources.” I agree with both, actually, but the latter definition has some terms which are highly technical, so I prefer the former.

As I said in my blog post in 2009, for me, it is just a way of computing (or doing your work with your computer) with your data or your applications--or both--on the Internet. To make the concept tangible, look at Google Drive or Zoho, both of which allow you to create documents (i.e., text, spreadsheets, presentations) on the web. Their online system constantly saves your work, so you don't need to manually save every few minutes or so.

Also, previously, when one thinks of a website, you use either your Notepad or Microsoft Frontpage to create your web pages on your computer, then upload them to your server (either your own server where you or your business is located, or to some webhosting provider). Now, you can create websites with Google Sites, which, among others, reduces your need to learn HTML or similar languages and hosts your webpages in their server. I will stop for now about the benefits of using Google Sites and talk about this later when I go to the Benefits part.

So we have an idea now what it is. What is it not? For example, is email a part of cloud computing?

In its previous form of POP3, email was not cloud computing. In the Philippines, however, we were introduced to its cloud-based form – the webmail. We used to access Yahoo! Mail and Google Mail from the web browser. Those email services, which you can access using your browser from any computer instead of an email client installed in your own computer with the settings preconfigured, are cloud-based.

Chatting, again, in its previous form of Internet Relay Chat (IRC), was not cloud-based, because there is a central server that moderates communication of messages among different users that access the chat through a preconfigured client. However, recent chat services through your web-browser (e.g., Google Hangouts and Yahoo! Mail's chat) would be considered cloud-based.

I am briefly delineating the difference between cloud and non-cloud computing to emphasize how cloud computing delivers its supposed benefits.

Cloud computing has a lot of benefits, and technical papers can list a lot. However, for us mere mortals (i.e., non-IT people), I will focus on what you may need to know.
  • Mobility and device independence (borrowed from the Wikipedia article) – You can access your data (e.g., download your data, make changes, create new file or record) from any device or location so long as you have internet connection and maybe a browser. 
  • Cost – Particularly if you use public clouds (we will define what a public cloud is later). Creating information systems that will give you the intended benefit would be costly both in monetary and experiential terms. Adopting cloud computing avoids the experience cost and as well as capital costs associated with systems development, as well as leveraging on the learning of other users that have used the system (and provided feedback to the cloud service provider).
  • Focus on your core function – For small businesses, public organizations or educational institutions, all of which may not have very big capital budget for information technology, cloud computing allows them to use technology available to support their core function without the need to develop (and spend on) their own systems. Looking back at my previous example of Google Sites, it allows collaboration in creating a website; multiple types of access (down to the page level), and easy addition of content. An additional feature of creating websites with Google Sites is that Google takes care of converting the website to a form that is mobile- and tablet-friendly. As it is Google-hosted, it allows you to also integrate other Google services to the website, like adding a calendar that will display information based on the person's Google account.  Imagine if you will have to write all these in code, and you are not a computer science graduate.
  • Security in data redundancy – While anti-cloud computing individuals say that it is risky, it can actually give IT people peace of mind as they know their data is stored and backed up in an off-site location (i.e. not in the place where the business is located). In the eventuality of disaster in the organization's area of operation, the  company knows that their data is backed up at a place where the disaster is not likely to have taken place as well. Of course, this requires [and does not take off the organization's or its delegated responsible person's responsibility to conduct due] diligence in determining where the cloud service provider actually stores your data. 
  • Collaboration – This feature of Google Docs (an example of a cloud-based software) ensured me buying into the idea. Google Docs (and its sister services, Google Sheets and Google Presentation), allowed multiple users to edit a file at the same time from different computers. Instead of sending back and forth different versions of a file, cloud computing allowed us to work on ONE file instead of sending back and forth different versions, which confused everyone which should they be using. Imagine if you have a web-based project management software, so that different project staff can update their responsibilities in the system, and the project manager can have near realtime update on how the project is doing. 
  • Maintenance – Cloud computing also allows IT administrators (if they use private clouds) to update just the back end (maintained by system administrators) of a computer system without tinkering the software on the front end (used by the end-users). They can simply update everything and the front end will update as they (usually) access the system through a web interface. 
  • Security in maintenance – As mentioned above, cloud computing allows system administrators to update the system without worrying about the end-users' client software. As most of the protocols are in the backend, cloud computing reduces the risk of a non-compliant end-user not updating his/her client software. 
Of course, cloud computing, like any HUMAN ACTIVITY, has its costs and risks. Here are some of the most commonly cited ones:
  • You need an internet connection. For you to access your data, you need to have internet connection. Most of the cloud storage services, like Dropbox and Google Drive, however, allow offline access by downloading your data on your computer and synchronizing it with your online account when you get online. 
  • You don't know where your data is located. As mentioned before, this requires you to conduct due diligence if your data is sensitive (e.g., related to national security or politics). Let me point out, however, that this is not limited to cloud computing. You could store your data in your USB flash drives. But if you are not careful, you could save your data to a virus- or malware-affected storage device, which may either corrupt the data or send it to an unknown person without your knowledge. Again, due diligence is required. 
  • Virus affecting all data in a cloud or networked system. Even if cloud-based systems are inter-connected, it does not mean that anyone (including a virus or piece of malware) can access data to another system without proper credentials (which is one of the foundation principles of information security). So even if a virus finds itself in a cloud storage device, being there does not mean (most of the time, if your system is designed right) will not mean corruption of everything in it. Another way of mitigating this is actually to ensure that you have your data backed up. 
Relatedly, if you are considering of outsourcing some of your information system processes, you should note what the contract says about ownership of the data.

To address the commonly raised concern on security and privacy, organizations may consider a private cloud facility. A private cloud facility harnesses the technical benefits of a public cloud (e.g., doing transactions on the server instead of installing applications in the end-users' computers) while ensuring that their data is in a facility they own. However, private clouds may not be able to leverage the economic benefits of public clouds as the organization will have to operate just as they have with a client-server system (i.e., maintaining a data center with its required supporting facilities, such as an airconditioned and secured room, back-up data storage facility, back-up power source, among others).

Personally, I am looking into the idea of getting a Network Attached Storage (NAS) with cloud capability, such as Western Digital's My Cloud Mirror or Synology's BeyondCloud Mirror. While we do have external hard disk drives, moving them makes the ports prone to wear and tear. I previously had to throw a 500GB external hard disk drive because I cannot access the data anymore (and I didn't know of any other way to get my data). With a NAS with cloud capability, my team can share files in a network environment (only those allowed through a username can access it) and access the same files when I am away through the internet.

Considering the benefits and the risks, and given due diligence in selecting the service provider or the facility, and being reminded of the supposed responsibility of ensuring security in designing ALL TYPES of information systems, I think cloud computing presents a real business case that executives should consider.

Encoded using LibreOffice

Sources:
Wikipedia, "Cloud Computing," accessed on 30 August 2015.
University of North Carolina, "Cloud Computing," accessed http://www.unc.edu/courses/2010spring/law/357c/001/cloudcomputing/examples.html on 30 August 2015.

Monday, December 8, 2014

What's in FOSS for us?

Updated from original posted at http://foss.upm.edu.ph/node/38 and http://castechbytes.blogspot.com/2011/01/units-to-upm-whats-in-foss-for-us.html

I am supposed to have a long introduction for this one, but I decided to remove the management perspective and get right into the opportunities Free and Open Source Software (FOSS) offers to educational institutions.

  1. Economy
  2. Security
  3. Intellectual property
  4. Innovation
  5. Community

Economy is the most common reason for adopting open source software. While not all open source software are free, most of them are. And when we get FOSS, we do not just get the application for free. Most of the time, we also get updates to it for free.

In state universities, which do not have a lot of funds for IT, FOSS really is a sensible option.

Security is an issue in information management that has finally (and thankfully) getting its breakthrough. Not a lot of people in educational institutions care about security (Yes, the security guards are included in this issue, but not only them.) when it comes to computing, but that relaxed attitude poses great risk to administrators and other IT users.

Security is not only about hacking or cracking, or about protecting your computer from theft. Security includes protection of your data from observation, inappropriate copying or corruption. Among these three, we usually neglect the third. We have to remember that data lost is almost the same as (or worse than) data copied.

Open source software have better security configurations than those created by few programmers. This is because the source code (the original, human-readable set of instructions) is viewed by many contributors.

Another issue is about intellectual property. Software piracy and buying pirated software are unethical behavior. No matter what you say your purpose is, the end does not justify the mean of getting pirated software.

Getting free and open source software frees you from this ethical dilemma. Also, you are not worried of pirated software which may either be broken or have malicious software embedded into it.

Innovation refers to the creative juices the FOSS community offers for your real needs. Because FOSS software development is driven by its community of users, FOSS has the tendency to better address the need of its immediate community.

Take note of the term ‘immediate.’ This is the operative word here. In today’s global context, community is always the whole World Wide Web. FOSS, in order to be really responsive, has to cater to particular communities.

Community refers to the unique community-driven and community-based approach among most FOSS. Take a look at all Linux OS distributions, the LibreOffice, FreeMind and other popular FOSS applications. What drives them? It’s not simply being free, but in being community-based.

As mentioned in Innovation, FOSS is community-driven. This results not only to innovation but also in instilling a spirit of unity among the users. People get to share what they know about certain applications, how to do certain things in more efficient and more effective ways, and share issues that may or may not be related to their disciplines or personal lives. Proprietary software does not do that (Notice they almost always use the ‘i’ or ‘my’?).

Need help? You can either read the manual, or ask the community! There is a forum for almost all open source software out there, listening to requests for help and feedback to improve the software.

As for me, I just love to learn something new and use what I learned to use in helping my college do what it needs to do without fear of cost or usage limitations.

Monday, December 1, 2014

Post in Facebook and automatically post to Twitter, too

You may have a Facebook account and a Twitter account. Probably because you have some friends who use one but not the other. Or you yourself have different social behavior for each. Or for some other reason. But probably, there are instances when you want to post one and you want that posted also on the other.

For example, this blog has a Facebook page as well as a Twitter account. When I post this to its own Facebook page, Facebook forwards the same post to Twitter, together with the link back to this post.

To do this, you have to log in to your Facebook account, and then go to the Twitter link page at this URL:

http://www.facebook.com/twitter

This page will show your Facebook pages, each with a button  giving you the option to link it to the appropriate Twitter account. When you click that button, it will ask for your Twitter account username and password. Make sure to check that you are entering the Twitter account where you want the post to appear.

If you want the other way around (i.e., post in Twitter first, and it will appear in Facebook), you can check this post:

https://support.twitter.com/articles/31113-using-twitter-with-facebook

I actually post primarily from Google+ and my posts there go to Facebook and Twitter. I will write how I do that next week. In the meantime, let me know if you have related questions :D


Monday, November 24, 2014

How email service providers tag spam

Updated from original posted on http://castechbytes.blogspot.com/2009/09/how-email-service-providers-tag-spam.html

As a former technical support representative for an Internet Service Provider (ISP) in the United States, I have come to understand how their mail servers work in reducing spam. Of course, the one I would like to discuss is just the basic manner of recognizing spam. Providers may have other means of identifying spam.

First: Let's have a working definition of "spam" as unsolicited email--be it business, advocacy or commercial email. I will not talk about IM spam or spam in other internet-based communication channels.

Identifying Spam


Spam identification is the key to spam reduction, so most email service providers actually focus on that aspect. As soon as they determine that an email is spam, what they do with that it is just a matter of creativity on the email address owner or the email service provider.

There are two key strategies in spam identification: Keyword monitoring and Spam tagging.

Keyword monitoring is (of course) monitoring key words in the email's subject line. Based on the ISP's pre-studied list of words that are most likely to contain spam, ISPs control the distribution of messages that have these words. Instead of going to the recipients' Inbox, they go to the Junk or Bulk mail folder.

This means that if you send email with words such as "tits" or "dicks" in your email, this will probably be sent to the Spam folder of the recipient if the email server has that approach in spam detection.

While this seems logical, the downside of this is not supported by certain groups as this same rule may be applied to the words such as "breasts" or "penis" which may be required in medical professions. Useful emails may be forwarded to the spam folder without knowledge of the recipient (obviously) even though they intend to receive it.

Spam tagging refers to the use of the "Report Spam" or "This is spam" feature of your email service provider to report that the email you are reading is actually a spam. Most of the email service providers have this feature, but the level of effect of the feature varies.

With this feature, what happens usually is that the email you use is transferred to your Spam folder. What you may not know, however, is that what you actually do is not just tag the email as spam, but tag the sender as a spammer.

This indicates that if a certain number of users tag that email sender as a spammer, the email server of the recipient will automatically tag it as spam, resulting to the future emails of that sender to be forwarded to the Spam folder, even for other receivers.

Of course, these days, a combination of the two, together with artificial intelligence, is used to identify spam.

Cost of Spam to Organizations


Spam costs a lot of money for organizations as they have to deal with wasted resources (bandwidth) and time (for deleting spam), not to mention privacy and other security issues, so proper identification of spam is important. Estimates in 2012 indicate that spam costs internet users globally about $20 billion to $50 billion. On other hand, you should be careful with tagging an email as "spam" if you are in an organization where the sender is sending relevant information that only you do not appreciate receiving. You may be costing the inconvenience not only to the sender but also to the other recipients.

If you discover that you  received an email but it is in the Spam folder when it is actually not spam, you can do any of the following:
  1. Use the "Unmark as spam" or similar feature
  2. Add the recipient's email address to your address book. This adds a rule to your email that the sender is a valid contact.

Monday, November 17, 2014

Using Pivot Table to summarize large numerical data in Microsoft Excel

Sometimes, you have large amounts of data in a spreadsheet. The data are related in such a way that separate cells are linked through a corresponding attribute, which appears in an adjacent cell or within the same row. Take for example the table here:

Let's say you want to have a smaller summary table that tells you how much capacity does each of the three grids (Luzon, Visayas, and Mindanao) have. If this is in a spreadsheet application like Microsoft Excel or Google Sheets, you can use Sort to sort the table by the grid column, then insert a row at the last entry for each group. In the corresponding column, you can use the function SUM. You can then create another sheet in the same file so that you can create a table with five rows and two columns. First row is header, last row is totals. It might look like this:

GridDependable Capacity
Luzon
Visayas
Mindanao
Total

However, what if you want to have another way of summarizing it? For example, you want to have a column that says, for each grid, show how much capacity is for Fossil, and how much is for Renewable? Something like this:

FossilRenewableSub-total
Luzon------
Visayas------
Mindanao------
Sub-total------

The task becomes a little bit harder. Of course, this can become even more difficult if you want to report per energy resource, sub-categories of Fossil and Renewable, like this:

FossilRenewableTotal
CoalOilNatural GasFossil Sub-totalHydroGeothermalSolarWindRenewable Sub-total
Luzon--------------------
Visayas--------------------
Mindanao--------------------
Sub-total--------------------

Of course, we are arranging this based on how you think is the logical way of presenting the data. What if you presented it to your supervisor, who says you should present it this way instead:

LuzonVisayasMindanaoSub-total
FossilCoal--------
Oil--------
Natural Gas--------
Fossil Sub-total--------
RenewableHydro--------
Geothermal--------
Solar--------
Wind--------
Renewable Sub-total--------
GRAND TOTAL--------

While theoretically, it should not be all that difficult, considering it is just a matter of re-organizing the data that were already calculated, there is actually a feature in spreadsheet applications that allows you to do automatic summarizing of numerical data (in sums and averages, among others) and arrange them by certain identifiable categories. It is called the Pivot Table tool.

With the advent of mobile version of applications, such as Android app of Google's Sheets, Pivot Table is one of those features that have yet to be made available to the mobile version. I guess it is a bit too sophisticated to be done in a mobile operating system. Nonetheless, in Google Sheets, if you have created it already in the web (desktop) version, you can make changes to the source sheet, and it will be updated in the pivot table sheet.

Pivot Table is a tool in spreadsheet applications (available in Microsoft Excel, LibreOffice Calc, and Google Sheets. It allows you to summarize large sets of numerical data by using columns that have repeating or grouping data. In the table, it refers to the columns on "Grid," "Source Type," "Clean or Not Clean," and "Source."

Before using Pivot Table


Pivot Table will use the values in the first row of the worksheet as labels of the output summary table (called the Pivot Table Report), so you must make sure that the data in the worksheet source are appropriately labeled.

Also, you have to ensure that the data is "clean," meaning you are sure that each cell contains the data that should be there. Your source table must be organized in such a way that each row is one record. We will discuss entity and data cleaning in the future, but for now, just ensure that one row corresponds to one record that is unique. Among others, this means that the data in that cell corresponds to that record (row) in that kind of variable (label or column).

Also, if you intend to make mathematical calculations, make sure that all numbers are formatted as numbers. If you have a cell wherein you manually inserted a comma, for example, your spreadsheet application will convert that number into text. If you need to use a comma for thousands and millions places, you can do this by using the cell formatting option of your spreadsheet program.

In this post, we will discuss how to do that in Excel. I will have separate posts for Calc and Google Sheets.

With Microsoft Excel


If you are using Microsoft Excel 2010, you can create a pivot table for your data by going to the Insert tab (Just make sure your cursor is in a cell within the table you want to summarize.), and then click Pivot Table.



You will be asked where you will source your data and where you want to put the Pivot Table Report. It should look like this:


Assuming you choose "New Worksheet," (and I recommend this, so that you can change the arrangement of the source table without affecting the output Pivot Table Report), and after you click "OK," you will go to the new sheet, with the column labels on the right side of the screen. Put a check on the labels that you want to include in your report.




As you see above, merely checking the labels will just add the labels in the "Row Labels" field. You have to move the labels to where you want it to display. In our example, we want each Grid to have a separate column, so we will move "Grid" to the "Column Labels" field, like so (You can drag and drop the label):

(In Excel, it can automatically detect that you want to add the values in the "Dependable Capacity" column, because the contents of that label are numerical values. However, you can also tell your spreadsheet application to summarize by using Average, or Count, Maximum, or Minimum, among others, depending on your spreadsheet application.)


If you want to refine your output by removing some specific values (e.g., you want to remove power plants that are in Mindanao), you can click the downward arrow. It will display a menu with finer controls.


Microsoft Excel allows you to produce a Pivot Table Report using multiple sheets, but I think that is rarely used in teaching. In case you want to know how to do that, you can read the official support article here.

Use of Pivot Table


Aside from summarizing, Pivot Table allows you to see possible relationships among the variables in your data because they are summarized. While the apparent relationships may not have the same confidence level as using statistical tools, such as Spearman Rho correlation, it can give you an idea which variables to test.

Wanna give it a try? You can test your knowledge by downloading clean data from The World Bank's Open Data initiative, available here.

Read how to use pivot table in LibreOffice Calc
Read how to use pivot table in Google Sheets

Using Pivot Table in Google Sheets

Sometimes, you have large amounts of data in a spreadsheet. The data are related in such a way that separate cells are linked through a corresponding attribute, which appears in an adjacent cell or within the same row. Take for example the table here:

Let's say you want to have a smaller summary table that tells you how much capacity does each of the three grids (Luzon, Visayas, and Mindanao) have. If this is in a spreadsheet application like Microsoft Excel or Google Sheets, you can use Sort to sort the table by the grid column, then insert a row at the last entry for each group. In the corresponding column, you can use the function SUM. You can then create another sheet in the same file so that you can create a table with five rows and two columns. First row is header, last row is totals. It might look like this:

GridDependable Capacity
Luzon
Visayas
Mindanao
Total

However, what if you want to have another way of summarizing it? For example, you want to have a column that says, for each grid, show how much capacity is for Fossil, and how much is for Renewable? Something like this:

FossilRenewableSub-total
Luzon------
Visayas------
Mindanao------
Sub-total------

The task becomes a little bit harder. Of course, this can become even more difficult if you want to report per energy resource, sub-categories of Fossil and Renewable, like this:

FossilRenewableTotal
CoalOilNatural GasFossil Sub-totalHydroGeothermalSolarWindRenewable Sub-total
Luzon--------------------
Visayas--------------------
Mindanao--------------------
Sub-total--------------------

Of course, we are arranging this based on how you think is the logical way of presenting the data. What if you presented it to your supervisor, who says you should present it this way instead:

LuzonVisayasMindanaoSub-total
FossilCoal--------
Oil--------
Natural Gas--------
Fossil Sub-total--------
RenewableHydro--------
Geothermal--------
Solar--------
Wind--------
Renewable Sub-total--------
GRAND TOTAL--------

While theoretically, it should not be all that difficult, considering it is just a matter of re-organizing the data that were already calculated, there is actually a feature in spreadsheet applications that allows you to do automatic summarizing of numerical data (in sums and averages, among others) and arrange them by certain identifiable categories. It is called the Pivot Table tool.

With the advent of mobile version of applications, such as Android app of Google's Sheets, Pivot Table is one of those features that have yet to be made available to the mobile version. I guess it is a bit too sophisticated to be done in a mobile operating system. Nonetheless, in Google Sheets, if you have created it already in the web (desktop) version, you can make changes to the source sheet, and it will be updated in the pivot table sheet.

Pivot Table is a tool in spreadsheet applications (available in Microsoft Excel, LibreOffice Calc, and Google Sheets. It allows you to summarize large sets of numerical data by using columns that have repeating or grouping data. In the table, it refers to the columns on "Grid," "Source Type," "Clean or Not Clean," and "Source."

Before using Pivot Table


Pivot Table will use the values in the first row of the worksheet as labels of the output summary table (called the Pivot Table Report), so you must make sure that the data in the worksheet source are appropriately labeled.

Also, you have to ensure that the data is "clean," meaning you are sure that each cell contains the data that should be there. Your source table must be organized in such a way that each row is one record. We will discuss entity and data cleaning in the future, but for now, just ensure that one row corresponds to one record that is unique. Among others, this means that the data in that cell corresponds to that record (row) in that kind of variable (label or column).

Also, if you intend to make mathematical calculations, make sure that all numbers are formatted as numbers. If you have a cell wherein you manually inserted a comma, for example, your spreadsheet application will convert that number into text. If you need to use a comma for thousands and millions places, you can do this by using the cell formatting option of your spreadsheet program.

In this post, we will discuss how to do that with Google Sheets.

With Google Sheets


To create Pivot Table in Google Sheets, you can go to the Data menu and select "Pivot table report." Again, to streamline the process, make sure your cursor is in the table source of your desired Report.


This will show you the screen similar to Calc, where you can choose which labels you want to place where. Unfortunately, unlike Calc, you cannot drag and drop the labels. You have to click "Add Field" in each "Field" to use it as a label. Like in the previous spreadsheet applications, you can use multiple levels per field.







Like in the previous spreadsheet programs, you can filter out data depending on the available labels by adding a field in the "Filter" field.



Click here for more references about Google Sheet's Pivot Table.

Use of Pivot Table


Aside from summarizing, Pivot Table allows you to see possible relationships among the variables in your data because they are summarized. While the apparent relationships may not have the same confidence level as using statistical tools, such as Spearman Rho correlation, it can give you an idea which variables to test.

Wanna give it a try? You can test your knowledge by downloading clean data from The World Bank's Open Data initiative, available here.

Using Pivot Table in Calc

Sometimes, you have large amounts of data in a spreadsheet. The data are related in such a way that separate cells are linked through a corresponding attribute, which appears in an adjacent cell or within the same row. Take for example the table here:

Let's say you want to have a smaller summary table that tells you how much capacity does each of the three grids (Luzon, Visayas, and Mindanao) have. If this is in a spreadsheet application like Microsoft Excel or Google Sheets, you can use Sort to sort the table by the grid column, then insert a row at the last entry for each group. In the corresponding column, you can use the function SUM. You can then create another sheet in the same file so that you can create a table with five rows and two columns. First row is header, last row is totals. It might look like this:

GridDependable Capacity
Luzon
Visayas
Mindanao
Total

However, what if you want to have another way of summarizing it? For example, you want to have a column that says, for each grid, show how much capacity is for Fossil, and how much is for Renewable? Something like this:

FossilRenewableSub-total
Luzon------
Visayas------
Mindanao------
Sub-total------

The task becomes a little bit harder. Of course, this can become even more difficult if you want to report per energy resource, sub-categories of Fossil and Renewable, like this:

FossilRenewableTotal
CoalOilNatural GasFossil Sub-totalHydroGeothermalSolarWindRenewable Sub-total
Luzon--------------------
Visayas--------------------
Mindanao--------------------
Sub-total--------------------

Of course, we are arranging this based on how you think is the logical way of presenting the data. What if you presented it to your supervisor, who says you should present it this way instead:

LuzonVisayasMindanaoSub-total
FossilCoal--------
Oil--------
Natural Gas--------
Fossil Sub-total--------
RenewableHydro--------
Geothermal--------
Solar--------
Wind--------
Renewable Sub-total--------
GRAND TOTAL--------

While theoretically, it should not be all that difficult, considering it is just a matter of re-organizing the data that were already calculated, there is actually a feature in spreadsheet applications that allows you to do automatic summarizing of numerical data (in sums and averages, among others) and arrange them by certain identifiable categories. It is called the Pivot Table tool.

With the advent of mobile version of applications, such as Android app of Google's Sheets, Pivot Table is one of those features that have yet to be made available to the mobile version. I guess it is a bit too sophisticated to be done in a mobile operating system. Nonetheless, in Google Sheets, if you have created it already in the web (desktop) version, you can make changes to the source sheet, and it will be updated in the pivot table sheet.

Pivot Table is a tool in spreadsheet applications (available in Microsoft Excel, LibreOffice Calc, and Google Sheets. It allows you to summarize large sets of numerical data by using columns that have repeating or grouping data. In the table, it refers to the columns on "Grid," "Source Type," "Clean or Not Clean," and "Source."

Before using Pivot Table


Pivot Table will use the values in the first row of the worksheet as labels of the output summary table (called the Pivot Table Report), so you must make sure that the data in the worksheet source are appropriately labeled.

Also, you have to ensure that the data is "clean," meaning you are sure that each cell contains the data that should be there. Your source table must be organized in such a way that each row is one record. We will discuss entity and data cleaning in the future, but for now, just ensure that one row corresponds to one record that is unique. Among others, this means that the data in that cell corresponds to that record (row) in that kind of variable (label or column).

Also, if you intend to make mathematical calculations, make sure that all numbers are formatted as numbers. If you have a cell wherein you manually inserted a comma, for example, your spreadsheet application will convert that number into text. If you need to use a comma for thousands and millions places, you can do this by using the cell formatting option of your spreadsheet program.

In this post, we will use LibreOffice or OpenOffice.Org Calc.

With LibreOffice Calc

For LibreOffice Calc, you can access the Pivot Table function by going to Data menu and selecting Pivot Table, like below. To streamline the process, put the cursor in the table where you know you will source the Pivot Table Report.


Like in Excel, Calc will ask where you want to get the data from:


Calc will then show you where you can place your labels. You can drag and drop the labels.

(In Calc, you have to make sure that you select the appropriate label in the "Data Fields." This is where the values will be summarized, which maybe summation, counting, maximum, or minimum. To change what type of operation you want to do, click the column label in Data Fields, choose, and then click "OK.")


 If you have multiple level labels (e.g., in rows, you will display first the Type of Source, then Source), you have to arrange it like with the higher level positioned first. Notice below that in the "Row Fields," the label "Source Type" is positioned first.


Your Pivot Table Report should look like this:


Like in Excel, if you want to refine the Report, just click on the downward arrows. You can also tell Calc to include only certain records based on certain value parameters, like include only records where the Dependable Capacity is greater than 10. To do that, just click the A1 Cell, and it will open up following Filter dialog box:


Unfortunately, if you make any changes in the source table, it won't automatically update the Pivot Table Report. But it can easily be updated. Just right-click any cell within the Report, and select "Refresh."

Using that process, you can also see the "Edit Layout." That will allow you to change the placement of labels, in case your supervisor wants to re-arrange the columns.

Click here for more references about LibreOffice Calc's Pivot Table.

On how to do this with Google Sheets, click here.

Use of Pivot Table


Aside from summarizing, Pivot Table allows you to see possible relationships among the variables in your data because they are summarized. While the apparent relationships may not have the same confidence level as using statistical tools, such as Spearman Rho correlation, it can give you an idea which variables to test.

Wanna give it a try? You can test your knowledge by downloading clean data from The World Bank's Open Data initiative, available here.