Assignment 2 – Intro to SQL DML

Due Date: Feb 14, 2017

Introduction

  • In this exercise you’ll be working in the nys schema within the gep664 database.
  • This exercise is worth 6 points. Each question is work 1/2 a point.
  • Be sure to follow the proper conventions for SQL statements and the guidelines for submitting assignments, as posted on the course website. Failure to do so will result in point deductions.

Background

In this exercise you’ll be working in the nys schema within the gep664 database. This schema includes four tables with census data that describe the economy and labor force in NY State.

nys.acs2014_labforce
resident labor force participation and employment from the 2010-2014 American Community Survey
nys.zbp2014
number of business establishments with employees and payroll from the 2014 ZIP Code Business Patterns
nys.zbp_flags
codes and descriptions for footnotes in the ZBP table
nys.zips
list of all ZIP Codes, with their postal cities and the ZCTA that each ZIP is correlated with (for residential ZIPs) or located in (for non-residential ZIPs)

The ACS data on the labor force is reported by ZIP Code Tabulation Area (ZCTAs), while the ZBP data on business establishments is reported by ZIP Code. What’s the difference?

Even though we think of ZIP Codes as areas, they are not. ZIP Codes are identifiers assigned by the US Postal Service to addresses along street segments. The Census Bureau, as well as private agencies and services like Google maps, will take this data and attempt to create areas based on concentrations of addresses that share the same ZIP. The Census Bureau does this by aggregating census blocks where the majority of addresses share the same ZIP Code; the resulting areas are called ZCTAs. Some ZIP Codes cannot be aggregated into areas, either because they represent one large organization or building that has its own ZIP, or because the code represents a large cluster of PO Boxes at a post office. As a result, there are more ZIP Codes than ZCTAs.

In our database, the ACS labor force data is reported by ZCTA, while the ZBP business establishment data is summarized by ZIP Code. You can aggregate ZIP Code data to the ZCTA level by assigning the non-residential ZIPs to the ZCTA where they are physically located. The zips table in our database allows you to do this, as each ZIP Code is related to a ZCTA.

Lastly, the data in the ACS tables represent estimates (at a 90% confidence interval) with margins of error. So for every estimate there are two columns: the estimate itself, and a margin of error for the estimate. The latter are stored in columns with the suffix _me. For example the labor force (labforce) for ZCTA 10001 was 14,547, plus or minus 947 (labforce_me).

Questions

For the following, provide the SQL statement and the record count that is automatically returned by pgAdmin or psql. There may be more than one way to answer certain questions (just choose one).

Part I – ZIP Code Business Patterns 2014 (ZBP) Table

1. Select all ZIP Codes with all columns that have more than 5000 employees.

2. Select all ZIP Codes in the Bronx that have more than 5000 employees.

3. Calculate the average number of employees for the entire table in a column called avg_emp.

4. For all ZIP Codes calculate the average payroll for employees in a new column called avg_pay. Sort the data and display the top ten ZIPs. Note: the payroll data is rounded to 1,000s of dollars, so multiply the result by 1,000 to get the right value. Make sure to exclude records where the employee value is null.

Part II – ZBP 2014 and USPS ZIP Code (ZIPs) Tables

5. Do a regular join between the zbp table and the zips table, to show which ZIPs are affiliated with which ZCTAs. Select just the zipcode and zipname from the zbp table and zcta from the zips table, and use aliases for the table names.

6. Do a left-outer join from the ZIP table to the ZBP table to identify which ZCTAs have no matching zipcode data. Sort the data in descending order by zcta. Select just the zipcode and zipname from the zbp table and zcta from the zips table, and use aliases for the table names.

7. Go back and do a regular join between the zbp table and the zips table. Aggregate and sum the establishments and employees from ZBP table by zcta. Use aliases for the table names and for the new columns.

Part III – American Community Survey (ACS) Table

8. Calculate the unemployment rate – the unemployed divided by total civilian labor force (lab_civilian) – for each ZCTA in a new column called unemp_rate. Do this for records where the number of unemployed is greater than 0. Since these columns are integers and you need a decimal result, use the cast operator:

(cast(lab_civunemp as numeric)/ cast(lab_civilian as numeric))*100

9. Select all records where the margin of error (_me) for the unemployed population is less than the unemployed population.

10. Now, calculate the unemployment rate for every ZCTA where the margin of error for the ZCTA’s unemployed population does not exceed the unemployed population.

Part IV – Written Responses

Please answer the following questions in two to four sentences.

11. What are the benefits for using aliases (with the AS statement) for column names and table names?

12. What is the difference between a regular (inner) join and a left-outer join? For what purpose would you use a left-outer join?

Leave a Reply

Your email address will not be published. Required fields are marked *