Your Perfect Assignment is Just a Click Away

Starting at $8.00 per Page

100% Original, Plagiarism Free, Customized to Your instructions!

glass
pen
clip
papers
heaphones

UMGC IFSM 330 Candy Assignment

UMGC IFSM 330 Candy Assignment

UMGC IFSM 330 Candy Assignment

 

Attachments:

· 2017_product_data_students-final.csv

· 2018_product_data_students-final.csv

· 2019_product_data_students-final.csv

· Candy_part_1_skeleton_for_students.SQL

Your company wants to merge its old product order data into a new data mart to facilitate analysis. You have been tasked with writing an ETL (extract, transform, and load) code sequence, and executing it on three years’ worth of order data.

In this assignment, you will produce SQL code which scrubs and imports each of the three years’ worth of data, and produces an output file called stagingTable.

Along with these instructions, there is another document, ‘Additional Clarification on the Week 6 Candy Assignment’. Please read that document carefully.

You should also read the ‘Data Notes’ in part 3 of this document. It is very important that you understand the data and how the data changes over the three years, so you can create a ‘stagingTable’ the effectively combines the data that might have been captured in different ways over the years.

Let’s get started!

Part A: Upload all the files you will need to SQLlite:

1. Import the file called “2017_product_data_students.csv” to SQLiteonline.com. When you import it, give it the table name “pd2017” (no quotes) and set the column name to “First line.”

 

2. Import “2018_product_data_students.csv” as “pd2018”

3. Import “2019_product_data_students.csv” as “pd2019”

4. If you SELECT * FROM pd2017, you should see something like the below screenshot. Note you should see all three of the import tables on the left, and the pd2017 data should match what is shown as selected.

 

Part B: Extract and Transform your data

Your job is to use SQL to perform an ETL which will accomplish the following:

1. Start with the skeleton starter script we give you, attached to this assignment. Modify the CREATE TABLE command so the schema is as follows:

 

Computer code

 

Notes about what you need to do

 

DROP TABLE IF EXISTS stagingTable;

 

Leave this code alone – it ensures you a fresh start

 

CREATE TABLE stagingTable (

yearInt INT(4),

monthInt INT(2),

–there will be more you need to fill in here

);

 

Right now this creates stagingTable with only two fields, the yearInt and monthInt. Modify the code where highlighted in yellow to correspond to the schema below.

And of course, when you’re done coding, remove the ‘– there will be more you need to fill in here’ comment J

 

2. Get the 2017 bit of the script working.

 

Computer code

 

Notes about what you need to do

 

–Insert 2017 Data

 

This is a comment telling you the 2017 data is going to be inserted here

 

INSERT INTO stagingTable(“monthInt”, “state”, “country”, “region”, “Product_Name”, “unitPrice” –you need to fill this in here)

 

After you have created stagingTable, this is the first half of a command that will insert the data into stagingTable. You need to replace the yellow highlighted material with your own code to complete it.

 

SELECT “Month”, “State”, “Country”, “Region”, “Product”, “Per-Unit_price”, “Quantity”, “Order_Total” FROM pd2017

;

 

This is the second half of a command that will insert data into stagingTable.

You don’t need to change anything here.

 

UPDATE stagingTable SET yearInt=2017;

 

This sets the year to 2017 for this data.

You don’t need to change this.

3. Get the 2018 part of the script working.

 

Computer code

 

Notes about what you need to do

 

–Insert 2018 Data

 

Comment.

 

INSERT INTO stagingTable(–you need to fill this in here)

 

This is the first half of a command that will insert the data into stagingTable. You need to replace the yellow highlighted material with your own code to complete it. See the rules below for more details.

 

SELECT … FROM pd2018

;

 

This is the second half of a command that will insert data into stagingTable.

You need to replace the yellow highlighted material.

 

UPDATE stagingTable SET yearInt=2018 WHERE yearInt ISNULL;

 

This sets the year to 2018 for any new entries which don’t yet have a year.

You don’t need to change this.

4. Get the 2019 part of the script working.

 

Computer code

 

Notes about what you need to do

 

–Insert 2019 Data

 

Comment.

 

INSERT INTO stagingTable(–you need to fill this in here)

 

You need to replace the yellow highlighted material with your own code to complete it. See the rules below for more details.

 

SELECT … FROM pd2019

;

 

You need to replace the yellow highlighted material.

 

UPDATE stagingTable …;

 

You need to replace the yellow highlighted material.

5. The script will load it into one final table and call it stagingTable

6. Run the checksum script to verify you have the stagingTable calculated correctly.

7. Export your final output table under the name “XX_output_final.csv” where XX are your initials. To export this, you can just use the Export button on the SQLlite menu (it’s right next to the Import button.)

You should do this all in SQLlite. You should not export to Excel and do your manipulations in Excel.

Part C: 2017 Data Notes

Your order 2017 data is contained in the attached file, “2017_product_data_students.csv” and you should have imported it as “pd2017.” A sample of this file’s type of data is contained below in Table 1 Sample of order data from 2017. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Country: text, should all be USA. (All data in this exercise should be USA.)

· Region: text, represents the regions within the country.

· State: text, USPS state abbreviations. Each state is within one region.

· Product: text. This is the name of a packaged food product.

· Per-unit price: integer. This represents the per-unit price in cents; for example, 300 indicates that Orange Creepies sell for $3.00 per package. (For the purposes of this exercise, disregard all currency formatting and just use 300 to represent $3.00.)

· Quantity: integer. This represents how many items were in that particular order. The first order here was for 49 packages of Orange Creepies.

· Order Total: integer. This is the per-unit price x the quantity. The first line here indicates that 300 x 49 = 14700 (or $147.00) was the price of the first order.

Table 1 Sample of order data from 2017

 

Month

 

Country

 

Region

 

State

 

Product

 

Per-Unit Price

 

Quantity

 

Order Total

 

0

 

7

 

USA

 

West

 

CA

 

Orange Creepies

 

300

 

49

 

14700

 

1

 

9

 

USA

 

Northeast

 

RI

 

Farm Fresh

 

365

 

49

 

17885

 

2

 

10

 

USA

 

South

 

TN

 

Farm Fresh

 

365

 

10

 

3650

 

3

 

12

 

USA

 

South

 

FL

 

Organiks

 

257

 

27

 

6939

 

4

 

12

 

USA

 

South

 

MD

 

PearApple

 

363

 

83

 

30129

 

5

 

8

 

USA

 

South

 

KY

 

Big Waffle

 

268

 

5

 

1340

2018 Data Notes:

Your order 2018 data is contained in the attached file, “2018_product_data_students.csv”

A sample of this file’s data is contained below as Table 2 Sample of order data from 2018. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Region: text, represents the regions within the country.

· Customer_ID: integer, represents the customer’s unique Customer ID number.

· Product: text. This is the name of a packaged food product.

· Per-unit price: integer. This represents the per-unit price in cents; for example, 363 indicates that PearApple sells for $3.63 per package. (For the purposes of this exercise, you should disregard all currency formatting and just use 363 to represent $3.63.)

· Quantity_1: integer. This represents how many items were in the first shipment of that particular order. This year we had shipping problems, and could often not ship the entire order all at once. Orders were split into two shipments where necessary, and Quantity_1 reflects how many units were shipped first. (Assume all shipments were completed in the month listed, and that no shipments had the first shipment in one month and the second shipment in the subsequent month.)

· Quantity_2: integer. This represents how many items were in the second shipment of that particular order. A 0 indicates a second shipment was not necessary. To get the total number of items shipped, you need to add Quantity_1 and Quantity_2.

· The first line here reflects that PearApple has a first shipment of 25 units, and a second shipment of 92 unit, all within the month of January, for a total of 25 + 92 = 117 units.

Table 2 Sample of order data from 2018

 

Month

 

Region

 

Customer_ID

 

Product

 

Per-Unit Price

 

Quantity_1

 

Quantity_2

 

0

 

1

 

Midwest

 

280

 

PearApple

 

363

 

25

 

92

 

1

 

5

 

West

 

545

 

Orange Creepies

 

300

 

87

 

79

 

2

 

8

 

Northeast

 

131

 

Future Toast

 

253

 

90

 

6

 

3

 

4

 

Midwest

 

920

 

Farm Fresh

 

365

 

33

 

74

 

4

 

12

 

South

 

358

 

Rotpunkt

 

220

 

4

 

13

 

5

 

1

 

South

 

855

 

GMO Guardian

 

176

 

17

 

45

2019 Data Notes:

Your order 2019 data is contained in the attached file, “2019_product_data_students.csv.”

A sample of this file’s data is contained below as Table 3 Sample of order data from 2019. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Country: text, represents the country of the customer. Should all be USA.

· Region: text, represents the regions within the country.

· State: USPS code for the 50 United States.

· Product: text. Same as previous years.

· Per-unit price: integer. This represents the per-unit price in cents; same as previous years.

· Quantity: This represents how many items were in that particular order. The first order here was for 95 packages of Only Pancakes.

· Order Subtotal: This represents the order subtotal, calculated as per-unit price x quantity. For example, the first order here reflects a per-unit price of 413 cents x 95 units, for a subtotal of 39,235 (or $392.35).

· Quantity Discount: This represents the new policy (effective January 1, 2019) that all orders 90 units and over will automatically earn a 10% discount. An order of 89 units does not earn the discount; an order of 90 units does earn the discount. All order discounts have been rounded to the nearest penny, so you can assume this field has no decimals in it. In the data below,

o Order 0, on the first line, of 95 Only Pancakes to Florida, did qualify for the Quantity Discount, because an order quantity of 95 exceeded the 90 threshold. The Quantity Discount has been computed as 3924, or 10% of 39235. In this case, the final order total would be 39,235 – 3,924 = 35,311 (or $353.11).

o Order 4, on the fifth line, of 31 Future Toasts to North Carolina, did not qualify for the Quantity Discount. Therefore, the Order total would simply be the Order subtotal.

Table 3 Sample of order data from 2019

 

Month

 

Country

 

Region

 

State

 

Product

 

Per-Unit Price

 

Quantity

 

Order Subtotal

 

Quantity Discount

 

0

 

9

 

USA

 

South

 

FL

 

Only Pancakes

 

413

 

95

 

39235

 

3924

 

1

 

6

 

USA

 

West

 

HI

 

Big Waffle

 

268

 

93

 

24924

 

2492

 

2

 

5

 

USA

 

Northeast

 

RI

 

Grey Gummies

 

446

 

95

 

42370

 

4237

 

3

 

9

 

USA

 

Midwest

 

NE

 

Funky Pops

 

380

 

100

 

38000

 

3800

 

4

 

11

 

USA

 

South

 

NC

 

Future Toast

 

253

 

31

 

7843

 

0

 

5

 

8

 

USA

 

West

 

WA

 

Mr Greens

 

447

 

76

 

33972

 

0

 

6

 

8

 

USA

 

South

 

MD

 

Giant Gummies

 

347

 

93

 

32271

 

3227

Part D: Check Your Own Work

1. You can run the following SQL code on your staging table. There is nothing to turn in from this bit. It should yield the following first few rows:

Select region, yearint, monthInt, count(*) from stagingTable where monthInt = 5 group by region, yearInt, monthInt;

 

2. You can also run the following code to debug. You should get the following rows:

Select yearInt, monthInt, state, customer_id, product_name, orderTotal from stagingTable

where product_name = ‘Big Waffle’ and monthint=4

order by product_name, yearInt, monthInt, state, customer_id, orderTotal;

 

Now that you’ve debugged your code, it’s time to get a checksum! Run the following code to get a checksum. The checksum will be a number. Put this checksum number on the top of your homework. See table below for help with your CHECKSUM result.

select sum(yearInt * monthInt * orderTotal)%2341 as checksum from stagingTable;

3. Once you get the result of your CHECKSUM look at table below for ways to troubleshoot any issues with your ETL statements.

 

If you get this checksum:

 

Hint:

 

1021

 

No further troubleshooting needed, go ahead, and submit your 3 deliverables

 

315

 

Look at your calculation for OrderTotal for 2019

 

349

 

Look at your calculation for OrderTotal for 2018 AND 2019

 

1944

 

Look at your calculation for OrderTotal for 2018 AND 2019 (hint: subtotal is NOT equal to ordertotal)

 

1767

 

Look at how your calculation for the quantity total for 2018.

Look at your calculation for OrderTotal for 2019

 

953

 

Look at how your calculation for the quantity total for 2018.

Look at your calculation for OrderTotal for 2018 AND 2019

 

Other

 

Look carefully at your syntax for every field. Run your 2017 insert first, then run your 2018 insert, then your 2019 insert statement.

TURN IN:

1. Your output file, called “XX_output_final.csv” where XX are your initials.

2. All the SQL code you used to execute this.

3. A document that contains

a. CHECKSUM: XXX where XXX is the checksum number produced. Put this in big font right on the top.

b. A one page outline of your ETL process. Which functions did you use, and what logic did you follow? This should be at the level that your boss, who has an MBA but not an IT/database background, can follow it. Do not use “computer-ese” here; use regular business English.


"Place your order now for a similar assignment and have exceptional work written by our team of experts, guaranteeing you A results."

Order Solution Now

Our Service Charter


1. Professional & Expert Writers: Ace Papers only hires the best. Our writers are specially selected and recruited, after which they undergo further training to perfect their skills for specialization purposes. Moreover, our writers are holders of masters and Ph.D. degrees. They have impressive academic records, besides being native English speakers.

2. Top Quality Papers: Our customers are always guaranteed of papers that exceed their expectations. All our writers have +5 years of experience. This implies that all papers are written by individuals who are experts in their fields. In addition, the quality team reviews all the papers before sending them to the customers.

3. Plagiarism-Free Papers: All papers provided by Ace Papers are written from scratch. Appropriate referencing and citation of key information are followed. Plagiarism checkers are used by the Quality assurance team and our editors just to double-check that there are no instances of plagiarism.

4. Timely Delivery: Time wasted is equivalent to a failed dedication and commitment. Ace Papers is known for the timely delivery of any pending customer orders. Customers are well informed of the progress of their papers to ensure they keep track of what the writer is providing before the final draft is sent for grading.

5. Affordable Prices: Our prices are fairly structured to fit in all groups. Any customer willing to place their assignments with us can do so at very affordable prices. In addition, our customers enjoy regular discounts and bonuses.

6. 24/7 Customer Support: At Ace Papers, we have put in place a team of experts who answer all customer inquiries promptly. The best part is the ever-availability of the team. Customers can make inquiries anytime.