CSV Data Files

Introduction

A CSV (Comma-Separated Values) file is a text file that has a specific format. It stores data in a table (rows/columns).

A CSV file uses commas to separate values (columns/fields) and a newlines separates records (rows).

Normally the first line of CSV file is a Header line. It contains a comma-separated list of headers (names/descriptions) for the columns in the file.

Project #1

Download the Titanic CSV file for this project. Read and process the file. (Note: Each row is a passenger.)

Create a program to read the Titanic CSV file one line (row) at a time. Parse each line into its component parts (columns/fields).

Count and display how many data items are missing from each column?

Count and display how many males and females are there?

Create a histogram plot of the age column/field? Does it approximate a bell curve?

There are several families among the passengers. Find them and display each surname and count.

How many passengers got on at each port of embarkation?

There are many ways of completing this project.

For the titanic data click HERE .

Project #2

Using the Titanic dataset, create a CSV file. Aggregate the data into families using surnames. The output CSV file should contain:

Do not forget a header line.

Project #3

Locate several test CSV files on the web and download them for processing.

Note: I have seen CSV files that have truncated records. No commas appear at the end of a record indicating empty columns/fields. It this case, the parser fills in the missing columns with empty strings because it knows how many columns there should be. For example:

if a file has 6 columns and a record contains a,b,c the parser assumes a,b,c,,,

The Titanic Dataset

To download the Titanic CSV File click here .

Links

LEARN PANDAS in about 10 minutes! A great python module for Data Science! (YouTube)

pandas documentation (Python Documentation)

CSV File Reading and Writing (Python Documentation)