Home / Expert Answers / Computer Science / exercise-10-2-question-use-d311-csv-as-a-data-source-and-perform-the-following-steps-1-pa938

(Solved): Exercise 10-2: Question: Use d311.csv as a data source, and perform the following steps: 1. ...



Exercise 10-2:

Question: Use d311.csv as a data source, and perform the following steps:

1. Create a subdirectory SPRK/Ex2 in HDFS and upload the d311.csv file to that subdirectory.

2. Start the Spark shell and read the d311.csv file. View the schema and note that the column names match the record field names in the CSV file. Provide a screenshot of the schema.

3. Display the data in the DataFrame using the show function. How many records are displayed? Display the first five records of the DataFrame. Provide a screenshot of the result.

4. Use the count action to return the number of items in the DataFrame. Provide a screenshot of the result.

5. Use a select transformation to return a DataFrame with only the Created Date, Agency, Complaint Type and City. The select transformation should return all columns with an alias instead of the real name. Display the schema of the new DataFrame. Provide a screenshot of the result.

6. Write a query (a series of one or more transformations followed by an action) that displays the first 20 lines of Agency, City, Complaint Type, where City is not null. Provide a screenshot of the result.

7. Perform the same query as in #6, but this time execute a single command to show the same results. Provide a screenshot of the result.



We have an Answer from Expert

View Expert Answer

Expert Answer



Step-1:-

The above question belongs to the field of data processing and analysis, specifically within the domain of big data analytics. It involves using distributed computing frameworks like Apache Spark to handle large datasets, perform transformations and actions on the data, and derive meaningful insights from it.

This falls under the broader scope of data science and analytics, which is an interdisciplinary field combining elements of computer science, statistics, and domain expertise to extract knowledge and insights from data.


We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe