Convert MD File to HTML

Introduction

A MD file is a text file created using one of several possible dialects of the Markdown language. It is saved in plain text format but includes inline symbols that define how to format the text (e.g. bold, indentations, headers, table formatting). MD files are designed for authoring plain text documentation that can be easily converted to HTML. From: .MD File Extension

Mark-up is a term from print editing - the editor would go through the text and add annotations (i.e. this in italic, that in bold) for the printers to use when producing the final version. This was called marking up the text. A computer mark-up language is just a standardized short-hand for these sorts of annotations. HTML is basically the web's standard mark-up language, but it's rather verbose. For example, A list in HTML: <ul> <li>Item one</li> <li>Item two</li> </ul> Markdown is a specific markup language, having its own simple syntax. For example: A list in Markdown: * Item one * Item two Markdown can't do everything HTML can, but both are mark-up languages.

From: Markdown vs markup - are they related?

Project #1

Create a Python program that will parse an MD file and create an equivalent HTML file. Do not use any existing modules or programs. Parse the text yourself.

Convert only a subset of the markdowns. (They are described below.)

The output HTML file should have the same name as the input MD file but with the file type '.html'.

Use the following template for your HTML output

<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta name="author" content="Your Name Here" /> <title>Your Page Title Here</title> </head> <body> Your HTML Here </body> </html>

Hint: For testing, double click on the HTML file or drag-and-drop the file onto the web browser icon.

Demonstrate your MD converter on the MD files found HERE

One approach (my approach) can be found HERE . It may not be the best solution, but it gets the job done.

Use These MarkDown Tags
MarkDownHTML
#<h1> ... </h1>
##<h2> ... </h2>
**bold**<b> ... </b>
//italic//<i> ... </i>
__underlined__<u> ... </u>
Of course you can **__//combine//__** all these.
The HTML paragraph tags are <p> ... </p>
The HTML line break tag is <br>

Paragraphs are separated by blank lines. If you want to force a newline within a paragraph, you can use two backslashes followed by a whitespace or the end-of-line. MarkDown tags do not cross paragraph boundaries. All MarkDown tags must be completed within a single paragraph. If not, it is an error.

Parsing a MD file is complicated. To simplify the parser for this problem, MarkDown tags must be completed within a single line.

Project #2

Add more markdowns to your program (2 or 3). You don't need to do them all.

Project #3

Create your own .MD file and convert it. (Copy-and-past if you find some text you like.)

Project #4

Create a graphics program that has a text area where the user can write their own MD text. Have a button that will generate a HTML file from the user's input. You will also need a way to display error messages.

project #5

You can see my approach to the problem using a LIFO queue (link above). Can you do it by using recursion?

Links

Markdown Home
Markdown Guide - Getting Started
Markdown Guide - Basic Syntax
tutorial.md
DukoWiki (home)
How would you go about parsing Markdown?
Creating Your Own Markdown Parser
A markdown parser with high extensibility
Create markdown_parser
Parsing Markdown into an Automated Table of Contents

You can test regular expression at regx101.com

Markup spec/BNF
js-play/grammar.md at master - Markdown
Markdown Syntax Reference

Hint

Because this is a demo and the input file is small you could use the Python read method to read the whole file with one read. The read() method can return a specified number of bytes from a file. The default is -1 and means "read the whole file". You could also read a line at a time.

#------------------------------------------------- # read the whole file #------------------------------------------------- f = open('Tom_Jones.md', 'r') print(f.read()) f.close() #------------------------------------------------- # read the file 24 bytes at a time #------------------------------------------------- with open('Mother_Jones.md', 'r') as f: while True: x = f.read(24) if len(x) == 0: break print(x) #------------------------------------------------- # read the file one line at a time #------------------------------------------------- with open('John_Paul_Jones.md','r') as f: for line in f: line = line.strip() print(line) #------------------------------------------------- # read the file 1 byte at a time using a generator #------------------------------------------------- def read_bytes(fobj,nbytes): while True: byt = fobj.read(nbytes) if not byt: return yield byt with open('Spike_Jones.md','r') as fobj: for c in read_bytes(fobj,1): if ord(c) == 10: print(f'ord={ord(c):3} c=\\n') elif ord(c) == 32: print(f'ord={ord(c):3} c=" "') else: print(f'ord={ord(c):3} c={c}')