Assignment 5 – Data Processing and SQL Review

Due date: Mar 12, 2019

Introduction

  • In this exercise you’ll be working with the copper data tables we processed in class and the gep664 database.
  • This exercise is worth 8 points. Each question is worth 1 point each.
  • Be sure to follow the proper conventions for SQL statements and the guidelines for submitting assignments, as posted on the course website. Failure to do so will result in point deductions.

Questions

For the following, provide the SQL statement AND the record count that is automatically returned by pgAdmin (for the tables you import, verify the record count after import).

This assignment assumes that you have already processed and cleaned the copper data tables as we did in class (since we didn’t finish, I’ve posted 3 of the 4 finished tables in copper_working.xlsx on the Software and Data page). The original data files are also available for download. You will be importing data into the gep664 database.

When importing data into the database you may either use the pgAdmin GUI (create blank table to hold data, right click, choose Import) or the COPY command (create blank table to hold data, write SQL COPY statement). The safest approach is to export spreadsheet files out as CSV files (using Save As – and select comma-delimited format) and then import CSVs into the database.

Remember: 1 – be careful when assigning number types to columns, to insure that you don’t truncate data and 2 – the order and number of columns in the database table must match the order and number in the import table, otherwise the import will fail. If you receive an encoding error when trying to import any table, try setting the encoding to WIN1252 in the import screen.

Part I – Table Creation and Import

For this exercise create a schema called copper for the user postgres (CREATE SCHEMA copper AUTHORIZATION postgres;) in the gep664 database. Then create well-structured tables with appropriate data types and keys, and import the copper data that we cleaned in class into the tables. Provide the CREATE TABLE statements and row counts.

1. Create a new database table called smelters and import the copper smelter data.

2. Create a new database table called trade and import the copper ore trade data.

3. Create a new database table called mines_porcu and import the main porphyry copper mining data. As a reminder – in class we deleted several columns; these are the only ones that should be in the final main table: rec_id, depname, country, iso, stprov, latitude, longitude, oreton, cugrd, mogrd, augrd, aggrd, deptype, comments.

Part II – Data cleaning and Import

For this part, you’ll work with the sedimentary copper mine table that we did not use in class, but that is stored with the other sample data in a folder called sedcu. It is in a CSV format called main.csv and is similar to the porphyry table in structure. Import it into a spreadsheet.

4. Using the ISO country code table that we used in class (country_codes.csv), write an Excel VLOOKUP formula to assign the proper 3-letter ISO country code to each country. For codes not found, fill them in manually. Instead of SQL code, submit the VLOOKUP formula as your answer and comment it out in the template. After the formula write one sentence that indicates which columns the formula is referring to for the lookup value, range, and returned value.

5. Delete unnecessary columns and save the table. Create a new database table in the copper schema called mines_sedcu and import the main sedimentary copper mining data. Provide the CREATE TABLE statement and row count. The columns that should appear in the final main table are: rec_id, depname, country, iso, stprov, latitude, longitude, oreton, cugrd, cogrd, aggrd, cuton, deptype, comments. Most of the columns are similar to the porphyry table, but you will need to re-order them so they’re consistent.

Part III – Select and Summarize

For this part, you will create summary views in the copper schema for the data you just imported. Hint: experiment with writing your SELECT statement first to get it working, then once it’s right, create the view.

6. Create a view called exports_5yr that sums the total copper ore exports by country for the past 5 years, and display just the top 20 countries by weight exported.

7. Create a view called import_smelt that sums the capacity of smelters by country and that shows the amount of copper ore imported into that country in 2015, for countries that have smelters and imports. The capacity of the smelters is in thousands of metric tons (TMT), while the weight of imports is in kilograms. 1,000 metric tons = 1,000,000 kilograms. Divide the imported ore by a million to get the amount of ore in TMT, and sort the data by country.

8. Create two separate views, one called pmine_total for porphyry mines and one called smine_total for sedimentary mines, that counts the number of copper mines (a count of records) and sums the total ore in tons available by country. Sort each view by number of mines in descending order.

BONUS Question. Optional, and worth 1 point. Count the mines and sum the ore totals for porphyry and sedimentary sites in one query to create a single view called copper.mines_total. You may not change the underlying structure of the tables or create different tables, but you may import the ISO country code table into the database as is and use it in the query. I have identified three possible solutions to this question: can you figure out one of them?