Comm3220: Data management for Decision Making

Contents

MP8: ACME Global

The Controller wants a detailed variance analysis

Goal

Learning how to use pivot tables in a Business Intelligence / Data Warehouse context: loading a pivot table from disparate data in a database.

Also, learning how to write business intelligence SQL queries that mimic the outcome of a pivot table, to understand similarities and differences between Pivot and SQL.

Context

Your experiences in retail and BI allowed you to land a new contract with Captech. Your new assignment is now ACME GLOBAL, a large global corporation. It is year-end, and the Controller wants you to perform a variance analysis of actual-versus-budgeted expenditures reported worldwide by the ACME branches. A staff member in the IT department tells you that all the five ACME geographical regions have already submitted their data in five tables in your DB at HQ (COMM3220). The tables are: ACME_N_AMERICA, ACME_S_AMERICA, ACME_PACIFIC_RIM, ACME_EUROPE, and ACME_ASIA.
At first you thought that you could do the analysis in Excel. You planned to compare actual and budgeted costs one by one and find material differences, but then you realized that each of the five ACME regions has about 3,000 rows of financial data. It did not take you much to conclude that you did not want to spend days doing this assignment, and that a better tool would be needed to speed up your work. You remembered that at McIntire you learned that both pivot tables or SQL can do the job way better than Excel.... no need to work nights and weekends!

Task

You will answer some questions twice, first using a Pivot table, and then using SQL.

1. Create a notebook called "MP8 YourLastname.ipynb".

2. Using SQL, Integrate the data from the five ACME regions into a single VIEW. No need to create an intermediate Table as you have done in the past. The view shows all the rows from the five tables and adds a new column called "Variance", where Variance = Budgeted-Actual expenses. Name the view <yourUserId>.ACME_GLOBAL_DATA.

3. Add a DROP IF EXISTS for the view at the top of the notebook. This is for the TA to check your work. Do not drop the tables with the data from the five regions.

4. In Excel, go to Data > Get data > From Database > From SQL Server database > enter the sever name (f-sg6m-s4.comm.virginia.edu) and DB (COMM3220) > If necessary, select "Database" to enter your DATABASE credentials (not NetBadge nor CommBadge). OK if not encrypted. > Select the view that you just created > Click on "Load to" and choose "Table" and an appropriate place.

5. Click anywhere inside the Table you just created, then go to Insert > Pivot Table.

6. If you are not an expert with Pivot Tables, watch this video. It is 20 minutes long, but it is worth watching to the end because it will teach you how to use Pivot tables for this assignment and - more importantly - for your next job. The first ten minutes cover the basics (skip of you know them), the second half covers more advanced functionality.

7. Use the Pivot Table to answer the business questions Q1-Q8 below. For each question, create a concise Pivot Table that answers the question, then highlight in yellow the cell or cells that contain the answer, and take a screenshot of the entire pivot table. Windows has an easy-to-use utility called 'Snip & Sketch'. MacOS has 'Screenshot'. A clear pic with your phone will also do.

8. Save the query texts and the screenshots in a .docx file. Call it "MP8 YourLastname.docx" and post it in myFiles.

9. Add to the notebook created in step 1 the text of the queries Q1-Q6 (yes: only Q1-Q6) and the SQL that answers them as you have done in the past: Q1 text, then SQL, then Q2 text, then SQL ... etc.

10. Post the notebook "MP8 YourLastname.ipynb" in myFiles.

Criteria

PIVOT: The highlighted cells must show only what is asked, and each screenshot should show the entire pivot. Format all $ numbers ('$', commas, but no decimal). Ok if you show grand totals. Answer each question with one pivot table.

SQL: Your SQL must reproduce as closely as possible the same data shown in the pivot, including formats, orders of columns and rows. All columns must have easy-to-understand names (ok if not identical to the pivot). It is understood that 'identical' is sometimes not possible.

Tips

This assignment is not only about producing the right answers but also figuring out how to produce them using the two tools, so you can compare and learn both. As always, Google, GenAI, and a healthy does of trial and error are your friends here.

A challenge in this project is to figure out what each question is asking. Variances are signed numbers. Negative variances are cases where actual expenses are larger than the corresponding budgets. So "a large negative variance" means a negative number with a large absolute value. By contrast, a variance is "small" if it is close to $0, no matter if it is positive or negative. A big variance has a large absolute number.

Business questions

Q1. Which division had the largest total negative variance in May? Show all total variances for May, largest total negative variance on top. Do not show other months.

Q2. Within that division and month, which department? Show all total departmental variances, largest total negative variance on top.

Q3. Which department had the most accurate bonus budget across the year and globe? Show all departmental bonus budgets and their variances (most accurate on top). In this case SQL is way easier. You will have to tinker a bit to get the Pivot to do it.

Q4. What are the top two sources (categories) of negative variance for R&D worldwide (show also the values)? Easy in SQL. Make the Pivot table only show the top two.

Q5. Show actual, budget, and variance totals for PR in Europe and Asia. Do not show data about other divisions.
If you cannot figure it out, show all Divisions.

Q6. Why is HR in Asia so over budget? Show the items, sorted by their contribution to the overage. Think about what 'contribution to the overage' means.

NOTE: The true power of PIVOT is revealed when we use the CONTENT of cells in a given column to create new columns. Make sure you understand this last sentence! For example, answering Q7 below requires to use the CONTENT of the cells in the 'month' column to create new columns ('Jan', 'Feb, 'Mar'....). This is trivial to do in the pivot table, but hard to do in SQL. SQL cannot easily do that, at least not without extensions and workarounds.

Q7. Produce a table showing total variances by division (row) and by month (column). No highlight necessary.

Q8. Produce a table showing actual expenditures by department (row) and geographical area (column). No highlight necessary.

MAC USERS: trying to do this project on a Mac (especially older versions) might not work.
Three alternative solutions:
1) Go to a lab in RRH or Shumway and use a windows PC (non McIntire: this requires permissions form the helpdesk. Do it in advance).
2) In at least one case a Mac user found a solution: the Mac Excel can connect to the COMM3220 DB, but it cannot see the Views that you create. So, as a fix, create the view (to satisfy the requirements of the project) but also create a table that looks like the view (so Excel on your Mac can see it).
This will create a table identical to the view (you will need to call it something different). At that point, your Mac should be able to connect and see the table, which will allow you to complete the project. Do not report in your homework the SQL for the table; just the SQL for the view.
3) If you still cannot connect to the server, go to ADS and select * from the view, then download to your PC the results from the select. Finally, load in Excel the file produced by ADS. Turn the data into a Excel table and you are good to go. Report in the notebook that this is what you did.