Count the Number of Words in A Text File

Project #1

Do part 1 first; Then do part 2.

Part 1. Write a program to count the number of words and lines in a text file.
Part 2. Collect statistics on the number and lengths of words. Print the statistics? Plot the statistics?

Project #2

How many unique words are there in the text file. e.g. How many times does the word "the" appear? How many times do the other words appear?

Project #3

How many sentences end with a period? A question mark? ...

Design

  1. Ask the user for the name of the file to process
  2. If no file name was entered, exit
  3. Process the file
  4. Print the file's statistics
  5. Loop

Possible Text Files For Testing

Declaration of Independence
United States Constitution
Your favorite magazine story or book
This document (screen scrape the text and paste into a text file)

Things to Think About

1. How to distinguish words in a text file?
   (separated/terminated by spaces, punctuation, EOS, EOF?)
2. What to do with non-text files? How can you know it is a text file?
3. Collect statistics on sentence length?

Note:

Linux/Unix wc

Linux/Unix have built-in command "wc" which can counts the words in a document. In this project you are to write a program in Python3, but it is interesting to read the "wc" documentation and try it out.

to display wc documentation

man wc

to run wc

wc text-file-name

Links

Plain Text - Dylan Beattie - NDC Oslo 2021 (YouTube)