Project #1
Read the name of two text files from the command line and compare them.
Display the lines that are different.
Also display statistics:
- file names
- number of lines that matched
- number of lines that did not match
Note: The second file on the command line should be compared to the first file
on the command line. Think of the first file as the standard and the
second file containing changes.
Project #2
Same as Project #1 except
- number of lines in file A not in File B
- number of lines in file B not in file A
This is a harder problem.
Things to Think About
- Tabs vs Spaces? Expand tabs then compare?
- Blank (empty) lines? Do they count? What if file A has an empty line
(no characters) and file B has a line with a space?
- What if file A has a space in the middle of a line
and file B has two spaces?
- What is a blank (empty) line? Is it a line with only spaces or no spaces?
- Should there be a command line option to ignore blank (empty) lines?
More than one spaces? All spaces?
- Uppercase vs lowercase? Convert to uppercase or lowercase before comparing lines?
- What about ASCII vs UTF-8 characters?
- If one or more lines have been inserted into a file, how far do we skip
before start trying to match lines?
See the Linux/Unix diff command documentation for some ideas?