Sample Code
#!/usr/bin/python3
# ===================================================================
# display (printable?) utf-8 characters and their bytes as hex
# ===================================================================
import io
filename = 'z_utf8.bin'
with open(filename, 'rb') as f:
wf = io.TextIOWrapper(f,'utf-8')
##loop_max = 0 # limit input for testing
while True:
##if loop_max > 2: break # limit reached?
##loop_max += 1 # increment loop count
# ---- get the next utf-8 encoded character
x = wf.read(1)
##print(f'\nx = {x} {type(x)}')
# --- end of file?
if len(x) < 1:
print('End of File')
break
# ---- convert UTF-8 Character to bytes
byts = bytes(x,'utf-8')
##print(f'byts = {byts} {type(byts)}')
# ---- is it a printable character?
if x.isprintable():
c = x
else:
c = 'xx'
# ---- print a UTF-8 character and its bytes (as hex)
print(f'({c}) ',end='')
for byt in byts:
##print(f'\nbyt = {byt:02x} {type(byt)}')
print(f'{byt:02x} ',end='')
# ---- print end-of-line (lf)
print()
Project X
Write a program the reads a file one byte at a time.
Collect the bits/bytes that make up a character and display the UTF-8 characters
as a hex dump.
Display the bits?
FYI: UTF-8 (Wikipedia)
BOM (Byte Order Mark)
See the UTF-8 section on the
Wikipedia entry Byte order mark.
Links
Python IO Library
Python bit functions on int (bit_length, to_bytes and from_bytes)
Really Good, Bad UTF-8 example test data [closed]