Home / Expert Answers / Computer Science / the-purpose-of-the-data-analysis-project-is-to-work-with-a-team-of-your-peers-on-a-multi-week-analys-pa856

(Solved): The purpose of the Data Analysis Project is to work with a team of your peers on a multi-week analys ...



The purpose of the Data Analysis Project is to work with a team of your peers on a multi-week analysis project comprised of several smaller assignments. You will be placed in your teams by the end of Week 1. Instructions For the data analysis project your team will be working with the Ames Housing dataset from Kaggle. Follow this link to see a full description of the dataset. This Excel data file called “Ames Housing Dataset.xlsx” is the same as the “train.csv” data file on Kaggle. A brief description of the data fields is available under Data on the competition page. You can find a detailed description of all data fields in the “data_description.txt” file if you sign up for a free account with Kaggle. Ames Housing Dataset Part I. Assigned Week 1, due by the end of Week 2 Start with downloading the data on your computer and getting familiar with the data set. Use Excel to answer the questions below. Use a separate worksheet in your Excel file to show your work for each question. Label worksheets using Q_# format. Report your findings in a provided Word document template, and include all the relevant output from Excel. Submit both your Excel file and your written project report as a Word document. List all the quantitative variables in the order they appear in the dataset in a separate worksheet named Q1. In the worksheet named Data, create two new variables: house age at the time of sale, HouseAgeSale, and number of bathrooms, BRTotal. Show the formulas you use to create the new variables. Copy the two columns with the new variables into a new worksheet, Q2. Calculate the mean and standard deviation for HouseAgeSale and BRTotal. Take a random sample of 250 observations, and save it in a separate worksheet labeled Q3_Sample.> From this point on, continue working with your random sample of 250 observations. Construct a histogram for SalePrice. Take a natural log of SalePrice and name the new variable LogPrice. Construct a histogram for LogPrice. Label the axes clearly and give titles to your histograms. Comment on the shape of the distribution for each variable. What do you observe after the log transformation? Create a scatterplot for GrLivArea and SalePrice. Describe the relationship between the two variables, point out any unusual features you might observe. Choose a categorical variable with at least four categories and construct a bar chart for it. Make sure your bar chart has a legend and is easy to read.



We have an Answer from Expert

View Expert Answer

Expert Answer


We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe