Monthly Archives: April 2019

Extra Credit – Review

Due Date: Apr 30, 2019

Introduction

  • In this optional exercise you’ll be working in the nys schema within the gep664_2019 database.
  • This exercise is worth 3 points. Each question is worth 1 point.
  • Be sure to follow the proper conventions for SQL statements and the guidelines for submitting assignments, as documented in the SQL Homework Guidelines. Failure to do so will result in point deductions.
  • For this assignment you will need to submit a SQL file and a map as an image file. When you submit your electronic assignment to the Box, attach the SQL file and the image as attachments in the same message.

Background

In this exercise you’ll be working in the nys schema within the gep664_2019 database. This schema includes tables with county-level census data that describe the economy and labor force in NY State. You’ll be using these county-level tables:

metros
counties that are part of metropolitan areas that fall within NY State
resworkers
number of workers who live in the county and their place of work from the 2012-2016 American Community Survey

Refer to the metadata table and back to assignment 3 if you want more information about these tables. You will also need to download a shapefile of NY state counties from the Data and Software page, Extra Credit data under Data for Homework.

Questions

For the following, provide the SQL statements and the record count that is automatically returned by pgAdmin.

1. Create a view called nys.metro_work_outside where you join the metro table to the resworkers table. For the CBSAs that are classified by type as Metropolitan Statistical Area, summarize the number of resident workers in one column and the number of workers who work outside their state of residence in another column by metro area. Sort the results by metro name in alphabetical order. The final result should represent one row for each metro (NOT one row for each county).

2. Load the shapefile of the NY State counties into the nys schema. When loading it you must define the system as NAD 83 SRID 4269 and rename the table counties. After the table is loaded, write an ALTER TABLE / COLUMN statement to transform the geometry to NYSP Central Zone, SRID 2261.

3. By joining the resworker and counties tables, create a new spatial table called nys.working_incounty that contains: county ID, county name, total resident workers, resident workers who work in their county of residence, a new column with the percentage of resident workers who work in their county of residence, and geometry of the counties. Remember there are multiple methods that you can use to create a new table from existing data – choose any one. To calculate the percentage you’ll need to use the CAST operator that we’ve seen before:

(CAST(column1 as numeric) / CAST(column2 as numeric))*100

Once you’ve created the table, add it to QGIS and create a map of the percentage of workers who work in their county of residence.

Assignment 10 – Organizing Spatial Data

Due date: Apr 16, 2019

Introduction

  • This assignment is dedicated to exploring what you would like to do for your final project.
  • This exercise is worth 6 points. Each question is worth 2 points each.
  • You do not have to use the SQL template for this assignment, as the questions do not involve writing code. Submit either a text file or a PDF.

Questions

Part I – Final Project Brainstorming

For this part, refer back to the journal articles that I emailed you, and to the description of the final project. You will need to do some background research on what you are thinking about pursuing for your project. Please allocate sufficient time for doing this: this will be your opportunity to receive feedback from me on your initial ideas.

1. Skim through all of the journal articles I sent you, choose one that interests you, and read it thoroughly. For the article that you chose, in your own words write a paragraph or two that explains what the authors’ research question or problem is, and describe how they used a spatial database to answer that question.

2. Write a paragraph or two that describes two or three final project ideas that you are considering pursuing. For each idea, explain what the research question or goal for the different projects would be.

3. For the ideas you have listed above, explore and list some possible data sources that you could use. Be specific: provide names of specific datasets or even variables – don’t just list general websites or repositories. For example: “I could use TIGER Line shapefiles for census tracts and educational attainment data from the latest 5-year American Community Survey from the Census Bureau”. Not: “I would use the census website and some shapefiles”.

Final Project

  • Presentation Due: Tue May 14, 2019 (in class, in Box by 5:30pm)
  • Final Project Due: Tue May 21, 2019 (by 9:30pm)

For the final project you will identify some publicly available datasets, build a workflow, and create a PostGIS database to answer an interesting spatial question of your choosing. You will also write a brief paper and create a map that describes your database.

See this document for full details. Our class on Tue Apr 9 will be devoted to discussing the project, and your homework assignment for that class will be dedicated to doing some background research for it.

Assignment 9 – Proximity Analysis

Due date: Apr 9, 2019

Introduction

  • In this exercise you’ll be working with the gep664 database in the nyc schema and the sample data from Chapter 10 in PostGIS in Action. It’s assumed that you have already loaded the ZCTA shapefile and census tract population centers into the nyc schema (which we did in class 7).
  • This exercise is worth 6 points. Each question is worth 1 point each.
  • Be sure to follow the proper conventions for SQL statements and the guidelines for submitting assignments, as posted on the course website. Failure to do so will result in point deductions.
  • For the following, provide the SQL statement and the row count.

Questions

This assignment will give you additional practice in working with the various methods for calculating distances, and with geography data types. In answering these questions, you should refer back to the slides from the lecture for class 9 and to Chapter 10 in PostGIS in Action.

Do NOT use ST_Buffer to answer any of these questions: use the distance functions (ST_DWithin, ST_Distance, etc). You do not have to use the KNN methods or operators unless it’s specifically asked for.

Part I – Measuring Distance with Geometry

In this first part, you’ll be working with data in the nyc schema; this includes the ZCTA feature that we loaded from a shapefile and the census tract population centers we loaded from a text file back in class 7. If you don’t have these files, download them from the course website (data files used for Class 7) and load them into the gep664 database. Refer to the in-class notes for class 7 for the steps we took to import both of these layers.

Remember the features in the nyc schema are in the NYSP Long Island spatial reference system, which uses feet. 5280 feet = 1 mile.

1. Calculate the distance from every subway station in the Bronx to every census tract population center (nyc.tract_popctr) in the Bronx. The unique id code for the Bronx is 36005; it appears in a dedicated column in the subway table, but is part of the geoid in the tract population center table. Your result should include the tract and station ids and names. Sort the data by stop_id and distance.

2. Select the ids of all census tract population centers that are within a distance of 1 mile from the two Yankee Stadium subway stations. Make sure to remove duplicates (a tract should only appear once in the results).

3. Sum the population for all census tract population centers that are within 1/2 mile of the Lincoln Center subway station. Your result should just show the total population for this area.

4. Using the KNN bounding box centroid operator , find the nearest 10 subway stations to ZCTA 10010. Return station ids and names.

Part II – Measuring Distance With Geography

In this part, you’ll be working with one of the sample data tables that’s included in the data for Chapter 10 of PostGIS in Action. A link for downloading this data is available on the course website; it’s a ZIP file that contains subfolders for every chapter in the book. You will work specifically with the airports table, which has a geography column.

Remember that units returned by the geography data type are in meters. 1000 meters = 1 kilometer.

5. For all of the airports that are of the type ‘large_airport’, measure the distance from each airport in Japan to each airport in Australia. Show the distance in kilometers, and the ids and names of the airports. Remember, to perform operations within a single table, you add the table to the FROM statement twice, and give each a different alias so you can reference them as separate entities (see the lecture slides for an example).

6. For all airports in Australia that are of the type ‘large airport’, return the closest airport of any type. Return distance in kilometers and names and ids of the airports. This question also requires you to perform operations within a single table. To remove results where each airport is being measured to itself, add an item to the WHERE clause to insure the airport id is not equal to the neighbor airport id: a.id != n.id. Hint – you can consider using the DISTINCT ON approach for this question.