Demonstrate a Lookaside List

Introduction

Lookaside lists can be used to improve performance in some cases. In this project you will simulate a lookaside lists, and collect and display performance statistics. It is an artificial problem that will demonstrate lookaside lists.

Note: There are Python modules that can be use to collect performance statistics. Examples are:

Be sure to search the web for more information, examples, and modules.

Problem Description

image missing

Note: I worked on a project that was similar to the problem described above. Implementing a simple (1 deep) lookaside list significantly improved throughput and reduced CPU usage.

The Data Store

I suggest you use one of the memory/file databases as your "data store" instead of creating your own. There are several that are available. Using one will simplify your code and be more realistic.

DBM is an example of one:

GNU dmb (gdbm) Documentation
dbm — Interfaces to Unix "databases"
Python – Database Manager (dbm) package
Python Data Persistence - dbm Package
Python has a built-in persistent key-value store

code example from the web

import dbm with dbm.open('my_store', 'c') as db: db['key'] = 'value' print(db.keys()) # ['key'] print(db['key']) # 'value' print('key' in db) # True

My DBM documentation/Cheatsheet

Project #1

This project will demonstrate the use of a 1 deep lookaside list and how/if it can improve efficiency by collecting and displaying performance statistics.

Notes:

  1. This will not be a real system. Keep everything in one program if possible.
  2. Functions can be used to generate messages, data display, data store, etc.
  3. use @dataclass objects as records?

Design

Which performance statistics should be collected and displayed?

I suggest collecting execution time for 2,000 runs of 2,000 messages to start with. Modify this until you get reliable statistics. Remember to use the same mix and order of messages. Also, collect record message type statistics to verify the record stream mix.

Actions Base On Message Mix Assumptions

ActionDescription
add a record The record is added to the data store and also put in the lookaside list.

If the record already exists in the data store, generate an error.

(If the lookaside list is more than 1 deep, replace a random record
or the oldest record. This depends on the expected message mix.)
modify a record Before fetching a copy of a record from the data store, see if it is
in the lookaside list. If not, fetch the record from the data store
into the lookaside list. Modify the record in the lookaside list and
send it to the data store.

If the record does not exists in the data store, generate an error.

(If the lookaside list is more than 1 deep, replace a random record
or the oldest record. This depends on the expected message mix.)
delete a record Delete the record from the data store, and the lookaside list.
display a
record's fields
If the record is not in the lookaside list, fetch a copy from the data store.
Display the record's field(s). Do not modify the lookaside list.

(This assumes there will not be a need for this record right away.)

Would it be appropriate for each message type have its own lookaside list?

Project #2

Allow the user to vary the size of the lookaside list, the fetch delay/latency, the mix of messages, etc.

Plot the performance statistics.

Questions

1. Why are we only interested in the "fetch record" latency?

We want the data store to contain the most up-to-date data as possible. Thus the faster we can update the data store, the more up-to-date it is.

The assumption is that the read/write code is as fast as possible and can not be improved. Faster hardware is a possibility but that decision is up to others.

The lookaside list can effect the apparent latency in "data store" reads because (in some circumstances) we do not perform a read. The data is already available in the lookaside list.

2.What is important about the incoming message mix?

It is important to understand the incoming message mix. For example, if the mix is truly random, the use of a lookaside list would have little effect on performance. Because the modification messages are "clustered", we get more-bang-for-our-buck by using a lookaside list.

3.Why collect statistics, etc?

To get more-bang-for-our-buck, you must "tune" the size of the lookaside list to the message mix. If the message mix changes significantly, using a lookaside list may be ineffective and perhaps even hurt performance.

4. Isn't a lookaside list just like caching?

Yes.

5. Where is a lookaside list used?

Lookaside lists (caches) are used in high performance situations. For example, operating systems and hardware. (And of course this demo.)