Just Click on Below Link To Download This Course:
MIS 660 Descriptive and Predictive Analytics GCU
MIS 660 Full Course Discussions GCU
MIS 660 Topic 1 DQ 1
Suppose you wanted to estimate the average
household income of all Grand Canyon University (GCU) students. To expedite the
process, you only gather household income data from all your friends who major
in business at GCU. You then calculate the average income among your friends
and report that it represents the average income of all GCU students. Is this a
good approach? If not, how would you gather data to derive a better estimate?
Explain your answer.
MIS 660 Top ic 1 DQ 2
Income data typically have some outliers. For
example, Tim Cook, CEO of Apple, Inc., had a salary of about 400 million in
2011. Suppose you had a data set of incomes in 2011 for all GCU faculty and Tim
Cook. Which measure of central tendency would you use when reporting on the
incomes in your data set if you do not want outliers to have much effect?
Explain your answer.
MIS 660 Topic 2 DQ 1
Suppose you had daily temperature data indicating
the “high” point of each day for 2015. If you want to show how the high differs
over time, what are some of the plot types that will allow you do this? What
are some benefits to binning the data into one of 52 weeks and plotting the
average high for each week? Would it make sense to do something similar for the
four quarters in the year? Why or why not?
MIS 660 Topic 2 DQ 2
Many times, data are missing because of
various reasons. This poses some challenges when doing data analysis. For
example, suppose you wanted to do some analysis of the yearly incomes of the
faculty at GCU. When asked for their incomes, 25% of the faculty did not
participate in the survey; therefore, their incomes are missing from the
dataset. How would you summarize the income data in this case? Is it
appropriate to ignore the missing incomes and summarize the data without them?
Should you estimate the missing incomes, perhaps with the overall average, to
complete the data set?
MIS 660 Topic 3 DQ 1
Data summarization is usually not enough when
performing analysis. Most of the time, adding context by telling a story about
the data is necessary to describe the analysis to others, especially those who
are not data-savvy. What are some general guidelines to follow to tell a good
data story? What story elements or structure should be used to organize the
presentation?
MIS 660 Topic 3 DQ 2
Consider your organization, or an
organization you are most familiar with. Explain the general process of data
aggregation for a typical metric (e.g., sales revenue, cost per unit, etc.)
used in the organization. What specific charts are commonly used to visually
depict the data? What might be some areas for improvement regarding how the
data is visually presented?
MIS 660 Topic 4 DQ 1
What are some of the limitations of using
Excel for pivot tables/charts? Why does that make software like Tableau more
appealing in the workplace?
MIS 660 Topic 4 DQ 2
Plotting summarized data will almost always
help to convey results more easily. However, there are situations where
plotting the summarized data instead of creating a simple table makes data
interpretation more difficult. Provide two examples of poor charts/graphs and
explain why they are difficult to interpret.
MIS 660 Topic 5 DQ 1
Data is useless without the skills to
manipulate, summarize, and analyze it. In fact, even after data is summarized
into a reporting format such as graphs and tables, it still requires someone to
add context and describe the results to fully explain the data. This can be
difficult, especially if data is being presented to nontechnical individuals.
Describe two techniques that can be used to better describe analysis results to
nontechnical individuals.
MIS 660 Topic 5 DQ 2
When most people think about data reporting
or visualization, they think about a nicely crafted graph that will not be
interactive with a user. Some new tools, such as Tableau, can create
visualizations that can interact with a user with informative pop-up
information, more drill-down information, and the ability to export filtered
results. Describe two benefits to having a user interact with a standard
report. Are there any drawbacks if the user modifies the report?
MIS 660 Topic 6 DQ 1
Summarize key data distribution concepts
including probability mass functions (PMF), probability density functions
(PDF), and cumulative distribution functions (CDF). Based on your organization
or any organization you are most familiar with, provide an example of a PMF, an
example of a PDF, and an example of a CDF, based on the type of data used in
the organization. How would you summarize each of these to someone who is
not familiar with each of these functions?
MIS 660 Topic 6 DQ 2
Suppose you had a six-sided die where each
number (1, 2, 3, 4, 5, and 6) has the same probability of showing up (1/6). If
the die is rolled an infinite number of times and the number recorded, what
will be the average value that shows up? Is the average value one of the actual
possibilities (1, 2, 3, 4, 5, or 6)? Why or why not?
MIS 660 Topic 7 DQ 1
Suppose you wanted to understand the
relationship between a customer’s yearly income (X) and the number of movies
(Y) the customer watched in a year. You then gather data on incomes and the
number of movies watched in a year. The range of incomes in your data set is
$5K to $150K. After fitting a simple linear model and performing all the
appropriate diagnostics, the model showed that, on average, for every $10K in
income, the customer watched 1.5 movies in the year. So, for example, if a
customer earned 60K in a year, he or she would be expected to watch nine movies
during the year. Now you want to apply this model to your very wealthy friend
who will earn $1 million in the next year. Is this an appropriate application
of your model? Why or why not? Provide specific examples to justify your
opinion.
MIS 660 Topic 7 DQ 2
If you regress daily high temperature (Y) on
the amount of ice cream sales (X), you will notice that there is a strong
positive correlation between the two. In other words, as daily ice cream sales
increase, the daily high temperature increases. This implies that if we knew
the amount of ice cream sales in a particular day, we could estimate, with a
high level of accuracy, the high temperature in that day. Does this mean that
if we wanted to increase the daily temperature, we need to sell more ice cream?
Explain why or why not?
MIS 660 Topic 8 DQ 1
Suppose you were asked to investigate which
predictors explain the number of minutes that 10- to18-year-old students spend
on Twitter. To do so, you build a linear regression model with Twitter usage
(Y) measured as the number of minutes per week. The four predictors you include
in the model are Height, Weight, Grade Level, and Age of each student. You
build four simple linear
regression models with Y regressed separately on each predictor, and each
predictor is statistically significant. Then you build a multiple linear
regression model with Y regressed on all four predictors, but only one
predictor, Age, is statistically significant, and the others are not. What is
likely going on among the four predictors? If you include more than one of
these predictors in the model, what are some problems that can result?
MIS 660 Topic 8 DQ 2
After building a regression model and performing
residual diagnostics, you notice that the errors show severe departures from
normality and appear to have nonconstant variance. What steps would you take in
this case to resolve the errors? If the problems are not corrected after all
steps are taken, what does that imply about the modeling approach you are
taking? Explain in detail.
MIS 660 Full Course Assignments GCU
MIS 660 Topic 1 Aggregating Data
The purpose of this assignment is to use a spreadsheet to create a
visual representation of a data set.
For this assignment, you will use the “Heights” dataset. In the dataset,
the heights (in mm) of n = 199 married couples are recorded. The data comes
from a random sample from the much larger population of married couples. Complete
each of the steps below to create a visual representation of the dataset.
Part 1:
Using Excel functions, calculate the following summary values for each
of the three variables:
- Minimum
- First quartile
- Second quartile (Median)
- Third quartile
- Maximum
- Mean
- Range
- Sample standard deviation
- Sample variance
- Coefficient of variation
Part 2:
Address each of the following questions in a written Word document.
- On average, are husbands or wives taller? What is the average
difference in millimeters between the two genders? Explain your answer.
- How would you interpret the median heights?
- Compare the means and the medians for each dataset. What initial
conclusions can be made here regarding the “contour” of each dataset?
- Compare the standard deviation values. Which dataset (husbands or
wives) has the most dispersion? What does your conclusion suggest?
- Given the answers in question 1, compare the variability of heights
between husbands and wives. Which partner type is more likely to have
extremely tall individuals (outliers)?
- Interpret the % coefficient of variation.
Part 3:
Your manager has requested some additional information from you
regarding the data. Specifically, you have been asked to calculate the
differences between “Male Heights” and “Female Heights.” Your manager is only
interested in married couples in which the husbands are taller than their
wives. Repeat the analyses requested in Part 1 for this new dataset. What
conclusions can be drawn here? Include discussion about whether outliers exist
in this dataset.
APA format is not required, but solid academic writing is expected.
This assignment uses a grading rubric. Please review the rubric prior to
beginning the assignment to become familiar with the expectations for
successful completion.
You are not required to submit this assignment to LopesWrite.
MIS 660 Topic 2 Data Manipulation
The purpose of this assignment is to use spreadsheet capabilities to
perform data manipulation and to explain the process used in the handling of
the data.
For this assignment, you will use the “Claims” dataset. In the dataset,
the claims data for n = 608 people are recorded. The data derive from a random
sample of females diagnosed with ischemic heart disease over 24 months (see
Exercise 7.27 in the textbook).
Instead of using urgent care centers, some people rely on the Emergency
Room (ER) to address most, if not all, of their medical needs. In fact, someone
who has three or more ER visits within 24 months is considered a high ER user.
Complete the steps below to execute this assignment.
- Using the dataset and Excel, create a new column titled
“High_ER_User” with “Yes” if three or more ER visits; otherwise “No.”
- Duration is measured in days, but 30-day intervals are more
appropriate for most reporting purposes. Using Excel, create a new column
titled “Duration_Months” by converting the duration into 30-day intervals.
- Many times complications and comorbidities are rare; therefore,
these two negative events are summed together. Using Excel, create a new
column titled “Comps_Comorbs” by adding complications with comorbidities.
- Many times age is grouped in 10-year intervals. Using Excel’s
VLOOKUP function, create a new column titled “Age_Group” with grouped ages
of “21-30 yrs,” “31-40 yrs,” and so on for 10-year intervals. The last age
group would be “61-70 yrs.” Use a tab titled “Age_Groups” for this task.
Next you will create a pivot table with the data and execute the
following (refer to the examples in the resource “Data Manipulation
Screenshots”).
- Use “High_ER_User” as a filter to obtain two filtered views of the
pivot table.
- Summarize the data to get counts of claims, sum of claims and
months, and average of procedures, prescribed drugs, ER visits, and
complications/comorbidities.
- Add a calculated field titled “Claims PM” to the pivot table. This
calculated field is the sum of claims divided by the sum of duration
months and measures the average claim amount per month (PM).
APA format is not required, but solid academic writing is expected.
This assignment uses a grading rubric. Please review the rubric prior to
beginning the assignment to become familiar with the expectations for
successful completion.
You are not required to submit this assignment to LopesWrite.
MIS 660 Topic 3 Visual Representation of Data
The purpose of this assignment is to use pivot tables and pivot charts
to aggregate data and to explain the process used for data aggregation.
For this assignment, you will use the “Claims 2” dataset. Use Excel
pivot tables and pivot charts for this exercise.
Part 1:
Create a dashboard describing the data by age group (e.g., 21-30 yrs,
31-40 yrs, 41-50 yrs, 51-60 yrs, and 61-70 yrs). The dashboard should include
the graphs and charts listed in the locations described. The dashboard should
be a separate tab in Excel that only includes the five items below. A sample
layout is provided below the dashboard description.
- Top Left: Bar graph showing the
average number of ER visits for each of the five age groups. Show the
actual average values above each bar.
- Middle Left: Bar graph
showing the average number of procedures for each of the five age groups.
Show the actual average values above each bar.
- Bottom Left: Bar graph
showing the average claim cost for each of the five age groups. Show the
actual average values above each bar.
- Top Right: Pie chart showing the
percent of the total sum of all claim costs for each of the five age
groups. Show the actual percent values of each slice.
- Bottom Right: Line graph
showing the percent of each age group that has one or more ER visits. Show
the actual percent values of each group. To create this chart, first
create a new calculated column, named “Has ER Visit,” that is equal to 1
when the patient has had one or more ER visits; otherwise 0. HINT: The
average of a 0-1 column is a percent. Refer to the example in the resource
“Visual Representation of Data Screenshot: Preview of the Excel
Dashboard.”
Part 2:
Interpret the dashboard and the story it is attempting to tell users by
writing a 250-word summary that clearly describes the merits of each of the
charts used on the dashboard. For example, discuss why a pie chart might be
more appropriate than a bar graph for highlighting the information you want key
stakeholders to obtain by studying that content on the dashboard. Include specific
discussion about why each specific chart is used to illustrate the data it
represents.
APA format is not required, but solid academic writing is expected.
This assignment uses a grading rubric. Please review the rubric prior to
beginning the assignment to become familiar with the expectations for
successful completion.
You are not required to submit this assignment to LopesWrite.
MIS 660 Topic 4 Data Visualization With Tableau
The purpose of this assignment is to use data visualization tools to
aggregate and depict data and to interpret the data visualization results.
For this assignment, you will use the “Claims 2” dataset. You will use
Tableau to replicate the dashboard you created in the Topic 3 assignment. In
addition, you will compare and contrast the Excel and Tableau software.
Part 1:
Create a dashboard describing the data by age group (e.g., 21-30 yrs,
31-40 yrs, 41-50 yrs, 51-60 yrs, and 61-70 yrs). The dashboard should include
the graphs and charts listed in the locations described. The dashboard should
be submitted as a Tableau file. A sample layout is provided in the resource,
“Data Visualization With Tableau Screenshot: Preview of the Tableau Dashboard.”
- Top Left: Bar graph showing the
average number of ER visits for each of the five age groups. Show the
actual average values above each bar.
- Middle Left: Bar graph
showing the average number of procedures for each of the five age groups.
Show the actual average values above each bar.
- Bottom Left: Bar graph
showing the average claim cost for each of the five age groups. Show the
actual average values above each bar.
- Top Right: Pie chart showing the
percent of the total sum of all claim costs for each of the five age
groups. Show the actual percent values of each slice.
- Bottom Right: Line graph
showing the percent of each age group that has one or more ER visits. Show
the actual percent values of each group. To create this chart, first
create a new calculated column, named “Has ER Visit,” that is equal to 1
when the patient has had one or more ER visits; otherwise 0. HINT: The
average of a 0-1 column is a percent.
Part 2:
In 250 words, compare and contrast the use of Excel and Tableau in data
visualization. Include specific discussion about the following in your summary.
- Software ease of use.
- Software visualization capabilities.
- Software limitations.
- Discussion of when each of these software programs is most
appropriate for use.
APA format is not required, but solid academic writing is expected.
This assignment uses a grading rubric. Please review the rubric prior to
beginning the assignment to become familiar with the expectations for
successful completion.
You are not required to submit this assignment to LopesWrite.
MIS 660 Topic 5 Benchmark – Telling the Analytics Story
The purpose of this assignment is to create a data story and communicate
findings to key stakeholders.
Part 1:
For this assignment, you will use the “Arizona Incomes by Zip Code”
dataset. You will use Tableau to create a data story that illustrates the
median household incomes of Arizona residents. The data provided includes all
zip codes in Arizona. Each record is a unique zip code. The data includes the
following columns:
- Zip_Code: An Arizona zip code.
- Metro_Area: Whether or not
the zip code is within the Phoenix-metro area.
- City: The city name of the zip
code.
- Median_Income: The median
household income of each zip code based on 5-year estimates (2010-2014)
from the U.S. Census Bureau.
The marketing manager has asked you to analyze income data for Arizona residents
so that leaders in his department can determine the company’s advertising
strategy. The marketing manager intends to share this data with other decisions
makers and the marketing department staff so that everyone has a thorough
understanding of how the income information can be used to determine specific
target markets in upcoming advertising campaigns. Because most of these
individuals do not have a strong understanding of analytics, they must be able
gain the information listed below from studying the charts. In the data story,
use visualizations, heat maps, boxplots, etc. to describe the following:
- How do incomes differ across zip codes within the Phoenix-metro
area (using geo-mapping)?
- What is the relative difference in incomes across zip codes within
the Phoenix-metro area (use a heat map)?
- What are the distribution of incomes across Arizona?
- How do incomes within the Phoenix metro area differ from those
outside of that area?
Part 2:
Demonstrate the ability to communicate the analytics story to key
stakeholders, including the marketing manager, by creating a 6-10 slide
PowerPoint presentation (with speaker notes for each slide). Use the charts
generated in Tableau to illustrate the data story as it relates to the income
of Arizona residents. The slides and speaker notes should address the following
for each chart presented.
- What information is the chart providing to stakeholders?
- Why is the information in the chart important to key stakeholders?
- How can this information be used in making decisions about how
marketing dollars can be allocated?
Refer to the resource, “Creating Effective PowerPoint Presentations,”
located in the Student Success Center, for additional guidance on completing
this assignment in the appropriate style.
While APA format is not required for the body of this assignment solid
academic writing is expected, and documentation of sources should be presented
using APA formatting, guidelines, which can be found in the APA Style Guide,
located in the Student Success Center.
This assignment uses a rubric. Please review the rubric prior to
beginning the assignment to become familiar with the expectations for
successful completion.
You are not required to submit this assignment to LopesWrite.
Benchmark Information
This benchmark assignment assesses the following programmatic
competencies:
MS Business Analytics
1.4: Utilize data visualization techniques to communicate findings.
MIS 660 Topic 6 Data Distributions
The purpose of this assignment is to apply data distributions to discrete
and continuous data and justify the selection of the distributions.
For this assignment, you will use the “Random Variables” dataset. You
will use SPSS to analyze the dataset and address the questions presented.
Findings should be presented in a Word document along with the SPSS outputs.
Part 1:
Identify if the following random variables are discrete or continuous.
- Number of defected items in a shipment.
- Height of males (in mm) who attend Grand Canyon University.
- Yearly income among all people in the United States.
- Whether or not a high school graduate is accepted into a college.
- Time that it takes for a person to run a mile.
- The number of emergency hospital visits that each person had in the
last 12 months.
Part 2:
Let X be
a random variable of the outcome after rolling a six-sided die one time that
is not fair.
In fact, the die is designed to never result in a 1 or 6, while the other
outcomes (i.e., 2, 3, 4, and 5) are equally probable.
- What are the individual probabilities for all possible values
of X?
- What are the cumulative probabilities for all possible values
of X?
- What is = ?
- What is = ?
- What is = ?
Part 3:
The dataset provided consists of the following random variables:
- BMI: The body mass index of a random set of people.
- Distance: The distance (in feet) that
a baseball player hit the ball.
- Height: The height of males (in
mm).
- Income: The income (in dollars) of
people in a large company.
- Pass: The outcome when taking an
exam (1=Pass; 0=Fail).
- Wait Time: The time (in minutes) that
it takes when waiting for the train.
Answer each question below. Use SPSS as needed, and include the software
outputs as part of the Word document you submit.
- What is a Q-Q plot?
- Given a set of realized values of a random variable, how can a Q-Q
plot be used to assess the distribution of the random variable?
- Using histograms and Q-Q plots (except for binomial), match each
random variable to one of the following distributions: Binomial (with N=1,
P=0.7), Chi-square (with d.f.=20), Exponential, Lognormal, Normal, and
Uniform.
APA format is not required, but solid academic writing is expected.
This assignment uses a grading rubric. Please review the rubric prior to
beginning the assignment to become familiar with the expectations for
successful completion.
You are not required to submit this assignment to LopesWrite.
MIS 660 Topic 7 Simple Regression Analysis
The purpose of this assignment is to apply simple regression concepts,
interpret simple regression analysis models, and justify business predictions
based upon the analysis.
For this assignment, you will use the “Trucks” dataset. You will use
SPSS to analyze the dataset and address the questions presented. Findings
should be presented in a Word document along with the SPSS outputs.
The business characteristics of n = 250 U.S. trucking and delivery
companies for calendar year 2011 were recorded. Among the characteristics
studied were the number of drivers and the number of trucks (power units) each
company employed.
Part 1:
Given that the data consists of counts and range of counts is large, a
natural log transformation is usually performed to improve the linear model
results. Apply a natural log transform to both variables and then plot the Y =
log(Trucks) vs. X = log(Drivers).
Is there a linear relationship? Justify your answer by providing the
SPSS output as an illustration.
Part 2:
Build a simple linear model by regressing Y on X and testing whether or
not a relationship exists between the number of drivers and the number of
trucks. Address the following questions in your written response:
- After fitting the model, plot the standardized residuals (on
vertical axis) vs. the standardize predictions (on horizontal axis). Is
there a pattern? How would you interpret the pattern or lack of pattern?
- After fitting the model, derive the normal probability plot and
interpret what the plot means.
- What is the coefficient of determination, R2, of the model? How would you interpret the R2?
- What is the estimate of β1? How would you interpret the estimate of β1?
- Is the estimate of β1 significantly different than 0? Assume an α = 0.01.
- What is a 95% confidence interval for β1? How would you interpret the 95% confidence interval for β1?
- If a new trucking and delivery company with 4,900 drivers were to
be formed, how many trucks would you expect the company would employ based
on the model?
APA format is not required, but solid academic writing is expected.
This assignment uses a grading rubric. Please review the rubric prior to
beginning the assignment to become familiar with the expectations for
successful completion.
You are not required to submit this assignment to LopesWrite.
MIS 660 Topic 8 Multiple Regression Analysis
The purpose of this assignment is to apply multiple regression concepts,
interpret multiple regression analysis models, and justify business predictions
based upon the analysis.
For this assignment, you will use the “Strength” dataset. You will use
SPSS to analyze the dataset and address the questions presented. Findings
should be presented in a Word document along with the SPSS outputs.
The compressive strength (Y) of concrete is influenced by the mixing
proportions and by the time that it is allowed to cure, although the exact
relationship between the strength and the components is unknown. The provided
data includes the results of n = 1030 concrete strength experiments that
include the following:
- Strength (in MPa): The compressive
strength of the concrete.
- Age (in days): The number of days the concrete was
allowed to cured.
- Coarse_Aggregate (in kg/m3): The proportion of coarse aggregate in the mix.
- Fine_Aggregate (in kg/m3): The proportion of fine aggregate in the mix.
- Cement (in kg/m3): The proportion of cement in the mix.
- Slag (in kg/m3): The proportion of furnace slag in the mix.
- Superplasticizer (in kg/m3): The proportion of plasticizer in the mix.
- Water (in kg/m3): The proportion of water in the mix.
- Ash (in kg/m3): The proportion of fly ash in the mix.
Part 1:
Derive various transformations of compressive strength to determine
which transformation, if any, results in a variable that most closely mimics a
normal distribution. To do this, plot Q-Q plots after each transformation
listed below, and decide which one should be used to build a multiple linear
model. Explain your answer and provide the SPSS output as an illustration.
- Strength (no transformation)
- Square root of Strength
- Squared Strength
- (Natural) Log of Strength
- Reciprocal of Strength
Part 2:
Based on the transformation selected in Part 1, build a multiple linear
regression model with all eight predictors.
- Use t-tests to determine if any of the predictors significantly
affect the compressive strength of concrete. Explain why each variable
should or should not be included in the model. Assume α = 0.05. Show the
appropriate model results to explain your answer.
- If any predictors from question 1 are found to be not significant,
remove them and re-run the model to create a reduced model (RM). Are all
the remaining variables still statistically significant? Show the appropriate
model results to explain your answer.
- Based on the RM, should there be concern about multicollinearity
among the predictors selected? Show the appropriate model results to
explain your answer.
- After fitting the RM, derive the residual plot (standardized
residuals vs. standardized predicted values) and normal probability plot.
Interpret each plot.
- What is the coefficient of determination, R2, of the RM? How would you interpret the R2?
- Based on the RM, what would be the new estimated compressive
strength that is currently 50 MPa, after a 10-day increase in curing time?
Assume all other predictors are held constant.
- How would you interpret the intercept (constant) in the RM? Does
the interpretation make sense given the data you used to build the RM?
Part 3:
Given the following components and aging time below, what is the
estimated compressive strength based on the RM?
- Age: 50 days
- Coarse_Aggregate: 900 kg/m3
- Fine_Aggregate: 600 kg/m3
- Cement: 300 kg/m3
- Slag: 200 kg/m3
- Superplasticizer: 7 kg/m3
- Water: 190 kg/m3
- Ash: 70 kg/m3
Part 4:
What is a 95% confidence interval of the
estimate in Part 3? How would you interpret the 95% confidence interval? (Hint: Use the SPSS
scoring wizard to address this question.)
APA format is not required, but solid academic writing is expected.
This assignment uses a grading rubric. Please review the rubric prior to
beginning the assignment to become familiar with the expectations for
successful completion.
You are not required to submit this assignment to LopesWrite.
Comments
Post a Comment