Home / Expert Answers / Computer Science / exercise-11-2-question-use-the-d311-csv-file-as-a-data-source-and-perform-the-following-steps-1-pa951

(Solved): Exercise 11-2: Question: Use the d311.csv file as a data source and perform the following steps: 1 ...



Exercise 11-2:

Question: Use the d311.csv file as a data source and perform the following steps:

1. Open the file to understand its structure and identify column names.

2. Create a subdirectory RDD/Ex2 in HDFS and upload the d311.csv file to that subdirectory. Start the Spark Shell.

3. Check if there is any header in the file. If there is a header in the first row, then remove it.

4. Create an RDD that reads the d311.csv file and displays the first 10 elements. Provide a screenshot of the results. Use the count action to return the number of items in the RDD.

5. Create a new RDD that captures only the Agency, City, and Descriptor.

6. Display the first few elements of the new RDD. Provide a screenshot of the result.

7. Create a new RDD that captures City and Descriptor, where the descriptor contains the word "Sidewalk". Provide a screenshot of the result.

8. Save the results of the RDD from #7 back into the cluster. Open another terminal and verify that the results are stored in the cluster. Provide a screenshot of the result.



We have an Answer from Expert

View Expert Answer

Expert Answer



Below is the Scala code to perform the operations in Spark Shell.



The above script assumes that the d311.csv file has a structure where Agency, City, and Descriptor are at indices 0, 1, and 2, respectively. You need to adjust the indices according to the actual structure of your CSV file.
We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe