Hex Dump UTF-8

Sample Code

#!/usr/bin/python3 # =================================================================== # display (printable?) utf-8 characters and their bytes as hex # =================================================================== import io filename = 'z_utf8.bin' with open(filename, 'rb') as f: wf = io.TextIOWrapper(f,'utf-8') ##loop_max = 0 # limit input for testing while True: ##if loop_max > 2: break # limit reached? ##loop_max += 1 # increment loop count # ---- get the next utf-8 encoded character x = wf.read(1) ##print(f'\nx = {x} {type(x)}') # --- end of file? if len(x) < 1: print('End of File') break # ---- convert UTF-8 Character to bytes byts = bytes(x,'utf-8') ##print(f'byts = {byts} {type(byts)}') # ---- is it a printable character? if x.isprintable(): c = x else: c = 'xx' # ---- print a UTF-8 character and its bytes (as hex) print(f'({c}) ',end='') for byt in byts: ##print(f'\nbyt = {byt:02x} {type(byt)}') print(f'{byt:02x} ',end='') # ---- print end-of-line (lf) print()

Project X

Write a program the reads a file one byte at a time. Collect the bits/bytes that make up a character and display the UTF-8 characters as a hex dump.

Display the bits?

FYI: UTF-8 (Wikipedia)

BOM (Byte Order Mark)

See the UTF-8 section on the Wikipedia entry Byte order mark.

Links

Python IO Library

Python bit functions on int (bit_length, to_bytes and from_bytes)

Really Good, Bad UTF-8 example test data [closed]