Monday, December 8, 2014

What's in FOSS for us?

Updated from original posted at http://foss.upm.edu.ph/node/38 and http://castechbytes.blogspot.com/2011/01/units-to-upm-whats-in-foss-for-us.html

I am supposed to have a long introduction for this one, but I decided to remove the management perspective and get right into the opportunities Free and Open Source Software (FOSS) offers to educational institutions.

  1. Economy
  2. Security
  3. Intellectual property
  4. Innovation
  5. Community

Economy is the most common reason for adopting open source software. While not all open source software are free, most of them are. And when we get FOSS, we do not just get the application for free. Most of the time, we also get updates to it for free.

In state universities, which do not have a lot of funds for IT, FOSS really is a sensible option.

Security is an issue in information management that has finally (and thankfully) getting its breakthrough. Not a lot of people in educational institutions care about security (Yes, the security guards are included in this issue, but not only them.) when it comes to computing, but that relaxed attitude poses great risk to administrators and other IT users.

Security is not only about hacking or cracking, or about protecting your computer from theft. Security includes protection of your data from observation, inappropriate copying or corruption. Among these three, we usually neglect the third. We have to remember that data lost is almost the same as (or worse than) data copied.

Open source software have better security configurations than those created by few programmers. This is because the source code (the original, human-readable set of instructions) is viewed by many contributors.

Another issue is about intellectual property. Software piracy and buying pirated software are unethical behavior. No matter what you say your purpose is, the end does not justify the mean of getting pirated software.

Getting free and open source software frees you from this ethical dilemma. Also, you are not worried of pirated software which may either be broken or have malicious software embedded into it.

Innovation refers to the creative juices the FOSS community offers for your real needs. Because FOSS software development is driven by its community of users, FOSS has the tendency to better address the need of its immediate community.

Take note of the term ‘immediate.’ This is the operative word here. In today’s global context, community is always the whole World Wide Web. FOSS, in order to be really responsive, has to cater to particular communities.

Community refers to the unique community-driven and community-based approach among most FOSS. Take a look at all Linux OS distributions, the LibreOffice, FreeMind and other popular FOSS applications. What drives them? It’s not simply being free, but in being community-based.

As mentioned in Innovation, FOSS is community-driven. This results not only to innovation but also in instilling a spirit of unity among the users. People get to share what they know about certain applications, how to do certain things in more efficient and more effective ways, and share issues that may or may not be related to their disciplines or personal lives. Proprietary software does not do that (Notice they almost always use the ‘i’ or ‘my’?).

Need help? You can either read the manual, or ask the community! There is a forum for almost all open source software out there, listening to requests for help and feedback to improve the software.

As for me, I just love to learn something new and use what I learned to use in helping my college do what it needs to do without fear of cost or usage limitations.

Monday, December 1, 2014

Post in Facebook and automatically post to Twitter, too

You may have a Facebook account and a Twitter account. Probably because you have some friends who use one but not the other. Or you yourself have different social behavior for each. Or for some other reason. But probably, there are instances when you want to post one and you want that posted also on the other.

For example, this blog has a Facebook page as well as a Twitter account. When I post this to its own Facebook page, Facebook forwards the same post to Twitter, together with the link back to this post.

To do this, you have to log in to your Facebook account, and then go to the Twitter link page at this URL:

http://www.facebook.com/twitter

This page will show your Facebook pages, each with a button  giving you the option to link it to the appropriate Twitter account. When you click that button, it will ask for your Twitter account username and password. Make sure to check that you are entering the Twitter account where you want the post to appear.

If you want the other way around (i.e., post in Twitter first, and it will appear in Facebook), you can check this post:

https://support.twitter.com/articles/31113-using-twitter-with-facebook

I actually post primarily from Google+ and my posts there go to Facebook and Twitter. I will write how I do that next week. In the meantime, let me know if you have related questions :D


Monday, November 24, 2014

How email service providers tag spam

Updated from original posted on http://castechbytes.blogspot.com/2009/09/how-email-service-providers-tag-spam.html

As a former technical support representative for an Internet Service Provider (ISP) in the United States, I have come to understand how their mail servers work in reducing spam. Of course, the one I would like to discuss is just the basic manner of recognizing spam. Providers may have other means of identifying spam.

First: Let's have a working definition of "spam" as unsolicited email--be it business, advocacy or commercial email. I will not talk about IM spam or spam in other internet-based communication channels.

Identifying Spam


Spam identification is the key to spam reduction, so most email service providers actually focus on that aspect. As soon as they determine that an email is spam, what they do with that it is just a matter of creativity on the email address owner or the email service provider.

There are two key strategies in spam identification: Keyword monitoring and Spam tagging.

Keyword monitoring is (of course) monitoring key words in the email's subject line. Based on the ISP's pre-studied list of words that are most likely to contain spam, ISPs control the distribution of messages that have these words. Instead of going to the recipients' Inbox, they go to the Junk or Bulk mail folder.

This means that if you send email with words such as "tits" or "dicks" in your email, this will probably be sent to the Spam folder of the recipient if the email server has that approach in spam detection.

While this seems logical, the downside of this is not supported by certain groups as this same rule may be applied to the words such as "breasts" or "penis" which may be required in medical professions. Useful emails may be forwarded to the spam folder without knowledge of the recipient (obviously) even though they intend to receive it.

Spam tagging refers to the use of the "Report Spam" or "This is spam" feature of your email service provider to report that the email you are reading is actually a spam. Most of the email service providers have this feature, but the level of effect of the feature varies.

With this feature, what happens usually is that the email you use is transferred to your Spam folder. What you may not know, however, is that what you actually do is not just tag the email as spam, but tag the sender as a spammer.

This indicates that if a certain number of users tag that email sender as a spammer, the email server of the recipient will automatically tag it as spam, resulting to the future emails of that sender to be forwarded to the Spam folder, even for other receivers.

Of course, these days, a combination of the two, together with artificial intelligence, is used to identify spam.

Cost of Spam to Organizations


Spam costs a lot of money for organizations as they have to deal with wasted resources (bandwidth) and time (for deleting spam), not to mention privacy and other security issues, so proper identification of spam is important. Estimates in 2012 indicate that spam costs internet users globally about $20 billion to $50 billion. On other hand, you should be careful with tagging an email as "spam" if you are in an organization where the sender is sending relevant information that only you do not appreciate receiving. You may be costing the inconvenience not only to the sender but also to the other recipients.

If you discover that you  received an email but it is in the Spam folder when it is actually not spam, you can do any of the following:
  1. Use the "Unmark as spam" or similar feature
  2. Add the recipient's email address to your address book. This adds a rule to your email that the sender is a valid contact.

Monday, November 17, 2014

Using Pivot Table to summarize large numerical data in Microsoft Excel

Sometimes, you have large amounts of data in a spreadsheet. The data are related in such a way that separate cells are linked through a corresponding attribute, which appears in an adjacent cell or within the same row. Take for example the table here:

Let's say you want to have a smaller summary table that tells you how much capacity does each of the three grids (Luzon, Visayas, and Mindanao) have. If this is in a spreadsheet application like Microsoft Excel or Google Sheets, you can use Sort to sort the table by the grid column, then insert a row at the last entry for each group. In the corresponding column, you can use the function SUM. You can then create another sheet in the same file so that you can create a table with five rows and two columns. First row is header, last row is totals. It might look like this:

GridDependable Capacity
Luzon
Visayas
Mindanao
Total

However, what if you want to have another way of summarizing it? For example, you want to have a column that says, for each grid, show how much capacity is for Fossil, and how much is for Renewable? Something like this:

FossilRenewableSub-total
Luzon------
Visayas------
Mindanao------
Sub-total------

The task becomes a little bit harder. Of course, this can become even more difficult if you want to report per energy resource, sub-categories of Fossil and Renewable, like this:

FossilRenewableTotal
CoalOilNatural GasFossil Sub-totalHydroGeothermalSolarWindRenewable Sub-total
Luzon--------------------
Visayas--------------------
Mindanao--------------------
Sub-total--------------------

Of course, we are arranging this based on how you think is the logical way of presenting the data. What if you presented it to your supervisor, who says you should present it this way instead:

LuzonVisayasMindanaoSub-total
FossilCoal--------
Oil--------
Natural Gas--------
Fossil Sub-total--------
RenewableHydro--------
Geothermal--------
Solar--------
Wind--------
Renewable Sub-total--------
GRAND TOTAL--------

While theoretically, it should not be all that difficult, considering it is just a matter of re-organizing the data that were already calculated, there is actually a feature in spreadsheet applications that allows you to do automatic summarizing of numerical data (in sums and averages, among others) and arrange them by certain identifiable categories. It is called the Pivot Table tool.

With the advent of mobile version of applications, such as Android app of Google's Sheets, Pivot Table is one of those features that have yet to be made available to the mobile version. I guess it is a bit too sophisticated to be done in a mobile operating system. Nonetheless, in Google Sheets, if you have created it already in the web (desktop) version, you can make changes to the source sheet, and it will be updated in the pivot table sheet.

Pivot Table is a tool in spreadsheet applications (available in Microsoft Excel, LibreOffice Calc, and Google Sheets. It allows you to summarize large sets of numerical data by using columns that have repeating or grouping data. In the table, it refers to the columns on "Grid," "Source Type," "Clean or Not Clean," and "Source."

Before using Pivot Table


Pivot Table will use the values in the first row of the worksheet as labels of the output summary table (called the Pivot Table Report), so you must make sure that the data in the worksheet source are appropriately labeled.

Also, you have to ensure that the data is "clean," meaning you are sure that each cell contains the data that should be there. Your source table must be organized in such a way that each row is one record. We will discuss entity and data cleaning in the future, but for now, just ensure that one row corresponds to one record that is unique. Among others, this means that the data in that cell corresponds to that record (row) in that kind of variable (label or column).

Also, if you intend to make mathematical calculations, make sure that all numbers are formatted as numbers. If you have a cell wherein you manually inserted a comma, for example, your spreadsheet application will convert that number into text. If you need to use a comma for thousands and millions places, you can do this by using the cell formatting option of your spreadsheet program.

In this post, we will discuss how to do that in Excel. I will have separate posts for Calc and Google Sheets.

With Microsoft Excel


If you are using Microsoft Excel 2010, you can create a pivot table for your data by going to the Insert tab (Just make sure your cursor is in a cell within the table you want to summarize.), and then click Pivot Table.



You will be asked where you will source your data and where you want to put the Pivot Table Report. It should look like this:


Assuming you choose "New Worksheet," (and I recommend this, so that you can change the arrangement of the source table without affecting the output Pivot Table Report), and after you click "OK," you will go to the new sheet, with the column labels on the right side of the screen. Put a check on the labels that you want to include in your report.




As you see above, merely checking the labels will just add the labels in the "Row Labels" field. You have to move the labels to where you want it to display. In our example, we want each Grid to have a separate column, so we will move "Grid" to the "Column Labels" field, like so (You can drag and drop the label):

(In Excel, it can automatically detect that you want to add the values in the "Dependable Capacity" column, because the contents of that label are numerical values. However, you can also tell your spreadsheet application to summarize by using Average, or Count, Maximum, or Minimum, among others, depending on your spreadsheet application.)


If you want to refine your output by removing some specific values (e.g., you want to remove power plants that are in Mindanao), you can click the downward arrow. It will display a menu with finer controls.


Microsoft Excel allows you to produce a Pivot Table Report using multiple sheets, but I think that is rarely used in teaching. In case you want to know how to do that, you can read the official support article here.

Use of Pivot Table


Aside from summarizing, Pivot Table allows you to see possible relationships among the variables in your data because they are summarized. While the apparent relationships may not have the same confidence level as using statistical tools, such as Spearman Rho correlation, it can give you an idea which variables to test.

Wanna give it a try? You can test your knowledge by downloading clean data from The World Bank's Open Data initiative, available here.

Read how to use pivot table in LibreOffice Calc
Read how to use pivot table in Google Sheets

Using Pivot Table in Google Sheets

Sometimes, you have large amounts of data in a spreadsheet. The data are related in such a way that separate cells are linked through a corresponding attribute, which appears in an adjacent cell or within the same row. Take for example the table here:

Let's say you want to have a smaller summary table that tells you how much capacity does each of the three grids (Luzon, Visayas, and Mindanao) have. If this is in a spreadsheet application like Microsoft Excel or Google Sheets, you can use Sort to sort the table by the grid column, then insert a row at the last entry for each group. In the corresponding column, you can use the function SUM. You can then create another sheet in the same file so that you can create a table with five rows and two columns. First row is header, last row is totals. It might look like this:

GridDependable Capacity
Luzon
Visayas
Mindanao
Total

However, what if you want to have another way of summarizing it? For example, you want to have a column that says, for each grid, show how much capacity is for Fossil, and how much is for Renewable? Something like this:

FossilRenewableSub-total
Luzon------
Visayas------
Mindanao------
Sub-total------

The task becomes a little bit harder. Of course, this can become even more difficult if you want to report per energy resource, sub-categories of Fossil and Renewable, like this:

FossilRenewableTotal
CoalOilNatural GasFossil Sub-totalHydroGeothermalSolarWindRenewable Sub-total
Luzon--------------------
Visayas--------------------
Mindanao--------------------
Sub-total--------------------

Of course, we are arranging this based on how you think is the logical way of presenting the data. What if you presented it to your supervisor, who says you should present it this way instead:

LuzonVisayasMindanaoSub-total
FossilCoal--------
Oil--------
Natural Gas--------
Fossil Sub-total--------
RenewableHydro--------
Geothermal--------
Solar--------
Wind--------
Renewable Sub-total--------
GRAND TOTAL--------

While theoretically, it should not be all that difficult, considering it is just a matter of re-organizing the data that were already calculated, there is actually a feature in spreadsheet applications that allows you to do automatic summarizing of numerical data (in sums and averages, among others) and arrange them by certain identifiable categories. It is called the Pivot Table tool.

With the advent of mobile version of applications, such as Android app of Google's Sheets, Pivot Table is one of those features that have yet to be made available to the mobile version. I guess it is a bit too sophisticated to be done in a mobile operating system. Nonetheless, in Google Sheets, if you have created it already in the web (desktop) version, you can make changes to the source sheet, and it will be updated in the pivot table sheet.

Pivot Table is a tool in spreadsheet applications (available in Microsoft Excel, LibreOffice Calc, and Google Sheets. It allows you to summarize large sets of numerical data by using columns that have repeating or grouping data. In the table, it refers to the columns on "Grid," "Source Type," "Clean or Not Clean," and "Source."

Before using Pivot Table


Pivot Table will use the values in the first row of the worksheet as labels of the output summary table (called the Pivot Table Report), so you must make sure that the data in the worksheet source are appropriately labeled.

Also, you have to ensure that the data is "clean," meaning you are sure that each cell contains the data that should be there. Your source table must be organized in such a way that each row is one record. We will discuss entity and data cleaning in the future, but for now, just ensure that one row corresponds to one record that is unique. Among others, this means that the data in that cell corresponds to that record (row) in that kind of variable (label or column).

Also, if you intend to make mathematical calculations, make sure that all numbers are formatted as numbers. If you have a cell wherein you manually inserted a comma, for example, your spreadsheet application will convert that number into text. If you need to use a comma for thousands and millions places, you can do this by using the cell formatting option of your spreadsheet program.

In this post, we will discuss how to do that with Google Sheets.

With Google Sheets


To create Pivot Table in Google Sheets, you can go to the Data menu and select "Pivot table report." Again, to streamline the process, make sure your cursor is in the table source of your desired Report.


This will show you the screen similar to Calc, where you can choose which labels you want to place where. Unfortunately, unlike Calc, you cannot drag and drop the labels. You have to click "Add Field" in each "Field" to use it as a label. Like in the previous spreadsheet applications, you can use multiple levels per field.







Like in the previous spreadsheet programs, you can filter out data depending on the available labels by adding a field in the "Filter" field.



Click here for more references about Google Sheet's Pivot Table.

Use of Pivot Table


Aside from summarizing, Pivot Table allows you to see possible relationships among the variables in your data because they are summarized. While the apparent relationships may not have the same confidence level as using statistical tools, such as Spearman Rho correlation, it can give you an idea which variables to test.

Wanna give it a try? You can test your knowledge by downloading clean data from The World Bank's Open Data initiative, available here.

Using Pivot Table in Calc

Sometimes, you have large amounts of data in a spreadsheet. The data are related in such a way that separate cells are linked through a corresponding attribute, which appears in an adjacent cell or within the same row. Take for example the table here:

Let's say you want to have a smaller summary table that tells you how much capacity does each of the three grids (Luzon, Visayas, and Mindanao) have. If this is in a spreadsheet application like Microsoft Excel or Google Sheets, you can use Sort to sort the table by the grid column, then insert a row at the last entry for each group. In the corresponding column, you can use the function SUM. You can then create another sheet in the same file so that you can create a table with five rows and two columns. First row is header, last row is totals. It might look like this:

GridDependable Capacity
Luzon
Visayas
Mindanao
Total

However, what if you want to have another way of summarizing it? For example, you want to have a column that says, for each grid, show how much capacity is for Fossil, and how much is for Renewable? Something like this:

FossilRenewableSub-total
Luzon------
Visayas------
Mindanao------
Sub-total------

The task becomes a little bit harder. Of course, this can become even more difficult if you want to report per energy resource, sub-categories of Fossil and Renewable, like this:

FossilRenewableTotal
CoalOilNatural GasFossil Sub-totalHydroGeothermalSolarWindRenewable Sub-total
Luzon--------------------
Visayas--------------------
Mindanao--------------------
Sub-total--------------------

Of course, we are arranging this based on how you think is the logical way of presenting the data. What if you presented it to your supervisor, who says you should present it this way instead:

LuzonVisayasMindanaoSub-total
FossilCoal--------
Oil--------
Natural Gas--------
Fossil Sub-total--------
RenewableHydro--------
Geothermal--------
Solar--------
Wind--------
Renewable Sub-total--------
GRAND TOTAL--------

While theoretically, it should not be all that difficult, considering it is just a matter of re-organizing the data that were already calculated, there is actually a feature in spreadsheet applications that allows you to do automatic summarizing of numerical data (in sums and averages, among others) and arrange them by certain identifiable categories. It is called the Pivot Table tool.

With the advent of mobile version of applications, such as Android app of Google's Sheets, Pivot Table is one of those features that have yet to be made available to the mobile version. I guess it is a bit too sophisticated to be done in a mobile operating system. Nonetheless, in Google Sheets, if you have created it already in the web (desktop) version, you can make changes to the source sheet, and it will be updated in the pivot table sheet.

Pivot Table is a tool in spreadsheet applications (available in Microsoft Excel, LibreOffice Calc, and Google Sheets. It allows you to summarize large sets of numerical data by using columns that have repeating or grouping data. In the table, it refers to the columns on "Grid," "Source Type," "Clean or Not Clean," and "Source."

Before using Pivot Table


Pivot Table will use the values in the first row of the worksheet as labels of the output summary table (called the Pivot Table Report), so you must make sure that the data in the worksheet source are appropriately labeled.

Also, you have to ensure that the data is "clean," meaning you are sure that each cell contains the data that should be there. Your source table must be organized in such a way that each row is one record. We will discuss entity and data cleaning in the future, but for now, just ensure that one row corresponds to one record that is unique. Among others, this means that the data in that cell corresponds to that record (row) in that kind of variable (label or column).

Also, if you intend to make mathematical calculations, make sure that all numbers are formatted as numbers. If you have a cell wherein you manually inserted a comma, for example, your spreadsheet application will convert that number into text. If you need to use a comma for thousands and millions places, you can do this by using the cell formatting option of your spreadsheet program.

In this post, we will use LibreOffice or OpenOffice.Org Calc.

With LibreOffice Calc

For LibreOffice Calc, you can access the Pivot Table function by going to Data menu and selecting Pivot Table, like below. To streamline the process, put the cursor in the table where you know you will source the Pivot Table Report.


Like in Excel, Calc will ask where you want to get the data from:


Calc will then show you where you can place your labels. You can drag and drop the labels.

(In Calc, you have to make sure that you select the appropriate label in the "Data Fields." This is where the values will be summarized, which maybe summation, counting, maximum, or minimum. To change what type of operation you want to do, click the column label in Data Fields, choose, and then click "OK.")


 If you have multiple level labels (e.g., in rows, you will display first the Type of Source, then Source), you have to arrange it like with the higher level positioned first. Notice below that in the "Row Fields," the label "Source Type" is positioned first.


Your Pivot Table Report should look like this:


Like in Excel, if you want to refine the Report, just click on the downward arrows. You can also tell Calc to include only certain records based on certain value parameters, like include only records where the Dependable Capacity is greater than 10. To do that, just click the A1 Cell, and it will open up following Filter dialog box:


Unfortunately, if you make any changes in the source table, it won't automatically update the Pivot Table Report. But it can easily be updated. Just right-click any cell within the Report, and select "Refresh."

Using that process, you can also see the "Edit Layout." That will allow you to change the placement of labels, in case your supervisor wants to re-arrange the columns.

Click here for more references about LibreOffice Calc's Pivot Table.

On how to do this with Google Sheets, click here.

Use of Pivot Table


Aside from summarizing, Pivot Table allows you to see possible relationships among the variables in your data because they are summarized. While the apparent relationships may not have the same confidence level as using statistical tools, such as Spearman Rho correlation, it can give you an idea which variables to test.

Wanna give it a try? You can test your knowledge by downloading clean data from The World Bank's Open Data initiative, available here.