Contents
MP8: ACME Global
The Controller wants a detailed variance analysis
Goal
Learning how to use pivot tables in a Business Intelligence / Data Warehouse context: loading a pivot table from disparate data in a database.
Also, learning how to write business intelligence SQL queries that mimic the outcome of a pivot table, to understand similarities and differences between Pivot and SQL.
Context
Your experiences in retail and BI allowed you to land a new contract with Captech. Your new
assignment is now ACME GLOBAL, a large global corporation. It is year-end, and the Controller wants you to
perform a variance analysis of actual-versus-budgeted expenditures reported worldwide by the ACME branches.
A staff member in the IT department tells you that all the five ACME geographical regions have already
submitted their data in five tables in your DB at HQ (COMM3220). The tables are: ACME_N_AMERICA,
ACME_S_AMERICA, ACME_PACIFIC_RIM, ACME_EUROPE, and ACME_ASIA.
At first you thought that you could do the analysis in Excel. You planned to compare actual and budgeted
costs one by one and find material differences, but then you realized that each of the five ACME regions has
about 3,000 rows of financial data. It did not take you much to conclude that you did not want to spend days
doing this assignment, and that a better tool would be needed to speed up your work. You remembered that at
McIntire you learned that both pivot tables or SQL can do the job way better than Excel.... no need to work
nights and weekends!
Task
You will answer some questions twice, first using a Pivot table, and then using SQL.
1. Create a notebook called "MP8 YourLastname.ipynb".
2. Using SQL, Integrate the data from the five ACME regions into a single VIEW. No need to create an
intermediate Table as you have done in the past. The view shows all the rows from the five tables and adds a
new column called "Variance", where Variance = Budgeted-Actual expenses. Name the view
<yourUserId>.ACME_GLOBAL_DATA.
3. Add a DROP IF EXISTS for the view at the top of the notebook. This is for the TA to check your work. Do
not drop the tables with the data from the five regions.
4. In Excel, go to Data > Get data > From Database > From SQL Server database > enter the sever
name (f-sg6m-s4.comm.virginia.edu) and DB (COMM3220) > If necessary, select "Database" to enter your
DATABASE credentials (not NetBadge nor CommBadge). OK if not encrypted. > Select the view that you just
created > Click on "Load to" and choose "Table" and an appropriate place.
5. Click anywhere inside the Table you just created, then go to Insert > Pivot Table.
6. If you are not an expert with Pivot Tables, watch this video. It is 20 minutes long, but it is worth watching to the end because it will
teach you how to use Pivot tables for this assignment and - more importantly - for your next job. The
first ten minutes cover the basics (skip of you know them), the second half covers more advanced
functionality.
7. Use the Pivot Table to answer the business questions Q1-Q8 below. For each question, create a concise
Pivot Table that answers the question, then highlight in yellow the cell or cells that contain the answer,
and take a screenshot of the entire pivot table. Windows has an easy-to-use utility called 'Snip &
Sketch'. MacOS has 'Screenshot'. A clear pic with your phone will also do.
8. Save the query texts and the screenshots in a .docx file. Call it "MP8 YourLastname.docx" and post it in
myFiles.
9. Add to the notebook created in step 1 the text of the queries Q1-Q6 (yes: only Q1-Q6) and the SQL that
answers them as you have done in the past: Q1 text, then SQL, then Q2 text, then SQL ... etc.
10. Post the notebook "MP8 YourLastname.ipynb" in myFiles.
Criteria
PIVOT: The highlighted cells must show only what is asked, and each screenshot should show the entire
pivot. Format all $ numbers ('$', commas, but no decimal). Ok if you show grand totals. Answer each question
with one pivot table.
SQL: Your SQL must reproduce as closely as possible the same data shown in the pivot, including formats,
orders of columns and rows. All columns must have easy-to-understand names (ok if not identical to the
pivot). It is understood that 'identical' is sometimes not possible.
Tips
This assignment is not only about producing the right answers but also figuring out how to produce them using the two tools, so you can compare and learn both. As always, Google, GenAI, and a healthy does of trial and error are your friends here.
A challenge in this project is to figure out what each question is asking. Variances are signed numbers. Negative variances are cases where actual expenses are larger than the corresponding budgets. So "a large negative variance" means a negative number with a large absolute value. By contrast, a variance is "small" if it is close to $0, no matter if it is positive or negative. A big variance has a large absolute number.
Business questions
Q1. Which division had the largest total negative variance in May? Show all total variances for May, largest total negative variance on top. Do not show other months.
Q2. Within that division and month, which department? Show all total departmental variances, largest total negative variance on top.
Q3. Which department had the most accurate bonus budget across the year and globe? Show all departmental bonus budgets and their variances (most accurate on top). In this case SQL is way easier. You will have to tinker a bit to get the Pivot to do it.
Q4. What are the top two sources (categories) of negative variance for R&D worldwide (show also the values)? Easy in SQL. Make the Pivot table only show the top two.
Q5. Show actual, budget, and variance totals for PR in Europe and Asia. Do not show data about other
divisions.
If you cannot figure it out, show all Divisions.
Q6. Why is HR in Asia so over budget? Show the items, sorted by their contribution to the overage. Think
about what 'contribution to the overage' means.
NOTE: The true power of PIVOT is revealed when we use the CONTENT of cells in a given column to create new
columns. Make sure you understand this last sentence! For example, answering Q7 below requires to use the
CONTENT of the cells in the 'month' column to create new columns ('Jan', 'Feb, 'Mar'....). This is trivial
to do in the pivot table, but hard to do in SQL. SQL cannot easily do that, at least not without extensions
and workarounds.
Q7. Produce a table showing total variances by division (row) and by month (column). No highlight necessary.
Q8. Produce a table showing actual expenditures by department (row) and geographical area (column). No
highlight necessary.
MAC USERS: trying to do this project on a Mac (especially older versions) might
not work.
Three alternative solutions:
1) Go to a lab in RRH or Shumway and use a windows PC (non McIntire: this requires permissions form
the helpdesk. Do it in advance).
2) In at least one case a Mac user found a solution: the Mac Excel can connect to the COMM3220 DB, but it
cannot see the Views that you create. So, as a fix, create the view (to satisfy the requirements of the
project) but also create a table that looks like the view (so Excel on your Mac can see it).
This will create a table identical to the view (you will need to call it something different). At that
point, your Mac should be able to connect and see the table, which will allow you to complete the project.
Do not report in your homework the SQL for the table; just the SQL for the view.
3) If you still cannot connect to the server, go to ADS and select * from the view, then download to your PC
the results from the select. Finally, load in Excel the file produced by ADS. Turn the data into a Excel
table and you are good to go. Report in the notebook that this is what you did.