Assignment 3 – Intro to SQL: Data Definition Language (DDL)

Due date: Feb 26, 2019

Introduction

  • In this exercise you’ll be working in the nyc and nys schemas within the gep664_2019 database.
  • This exercise is worth 7 points. Each question is work 1/2 a point, except the final question which is worth 1 point.
  • Be sure to follow the proper conventions for SQL statements and the guidelines for submitting assignments, as documented in the SQL Homework Guidelines. Failure to do so will result in point deductions.

For the following, provide the SQL statement that answers each question; provide the row count only if asked. There may be more than one way to answer certain questions (just choose one).

Part I – Weather Stations table (nyc schema)

In order to select weather stations by state, we currently have to use the LIKE statement to do pattern searching. This is extremely inefficient; let’s add a dedicated state column that will allow for more efficient queries.

1. Alter the weather stations table by adding a column called state that will hold the two-letter state code (i.e. NY, NJ).

2. Update the new state column by adding the two letter state code that indicates where the station is located. You can do this using either one or two statements.

Part II – NYC Borough Data (nyc schema)

You’ll get some practice in table creation by adding a couple of tables with data for the five boroughs of NYC. Each of the boroughs is also considered a county; a borough is a local designation for subdivisions of the City of New York, while counties are a federal designation for subdivisions of states.

3. Create a table called nyc.boroughs_area to hold the following data: codes and names for the boroughs, and their area in water and land in square miles (for column names, use the names in question 4). Designate a primary key, specify that borough names should not be null, and add a check constraint that land area must be greater than zero.

4. Insert the data into the new table with an INSERT INTO VALUES statement.

bcode,bname,area_wtr,area_lnd
bx,Bronx,15.38,42.10
bk,Brooklyn,26.10,70.82
mn,Manhattan,10.76,22.83
qn,Queens,69.68,108.53
si,Staten Island,43.92,58.37

5. Create a view called nyc.boroughs_sum that has the borough name and calculated fields that have the total land area, the percent total that is water, and the percent total that is land. Optionally – you can round the percentages to whole numbers.

6. Create a table called nyc.counties_pop to hold the county code, county name, borough code, and population for 2010 (for column names, use the names in question 7). Make county code the primary key and borough code the foreign key.

7. Insert the data into the new table with an INSERT INTO VALUES statement.

ccode,cname,bcode,pop2010
36005,Bronx County,bx,1385108
36047,Kings County,bk,2504700
36061,New York County,mn,1585873
36081,Queens County,qn,2230722
36085,Richmond County,si,468730

8. Create a new table called nyc.boroughs_pop with data from the borough area and county population tables that contains: the borough code as primary key, borough name, total population, and population density (land area divided by population).

Part III – Import Labor Force Table (nys schema)

We’ll import a table from the Census 2012-2016 American Community Survey that indicates whether resident workers in a county work in the same municipality in which they live. There is one record for each county in New York State, and like the previous ACS tables we’ve worked with each estimate column has a margin of error (me) column associated with it (for this exercise you can ignore the me columns).

Go to the Data & Software tab on the course website and download the homework data for Class 3, which is a CSV file for place of work by MCD of residence. This CSV has a header row and uses commas for delimiters.

9. In the nys schema create a table called nys.resworkers_mcd that will accommodate all of the data in the CSV file. Make sure to designate a primary key.

10. Use pgAdmin to insert the data from the CSV file into the table. Once it’s imported, write a basic select statement to select all the records to verify the result. Then write a second statement that returns all the counties where the number of workers who work in their MCD is greater than the number that work outside their MCD. In your answer provide both SELECT statements and the number of rows returned (via the Messages tab in pgAdmin).

Part IV – Written Responses

Please answer the following questions in two to four sentences.

11. What is the difference between a primary key and a foreign key?

12. What is a view and how is it different from a table? For what purpose would you create a view?

13. Imagine you have a data table with population data for places, which is a US Census term for cities and towns. The data includes: a 5-digit code that contains digits from 0 to 9 that uniquely identifies each place, a place name, the two-letter state abbreviation, a footnote that indicates whether a place is incorporated or not, the total area in square miles with two decimal places, the 2010 census population count, and a 2017 population estimate. Write a CREATE TABLE statement for this data, paying attention to best practices for creating names, assigning data types, and building well structured tables. Then, write a few sentences describing your rationale for creating the table the way that you did.