SmartDict: Dynamic pointing of values in Python dictionaries

Introduction

Chain assignment of dictionary values, which can be used to reduce redundancy in configuration dictionaries.

Installation

1
pip install smartdict

Usage

Assuming the following configuration dictionary exists:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"dataset": "spotify",
"load": {
"train_path": "~/data/spotify/train",
"dev_path": "~/data/spotify/dev",
"test_path": "~/data/spotify/test"
},
"network": {
"num_hidden_layers": 3,
"num_attention_heads": 8,
"hidden_size": 64
},
"store": "checkpoints/spotify/3L8H/"
}

It can be observed that many paths such as train_path are related to the dataset name dataset, and the store storage path is related to the dataset name, network structure, and so on. Additionally, if the dataset or network structure is changed, the configuration dictionary needs to be modified complexly.

One solution is to handle the redundant information in Python code. For example, the value of train_path is directly assigned to train, in the code as follows:

1
data['load']['train_path'] = os.path.join('~', 'data', data['dataset'], data['load']['train_path'])

However, this method is not flexible enough. As the number of configuration properties increases, the amount of code required also increases linearly, making the preprocessing of configuration data slightly cumbersome.

Normal String Referencing ${}

We propose to construct the dictionary values as dynamic data, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"dataset": "spotify",
"load": {
"base_path": "~/data/${dataset}",
"train_path": "${load.base_path}/train",
"dev_path": "${load.base_path}/dev",
"test_path": "${load.base_path}/test"
},
"network": {
"num_hidden_layers": 3,
"num_attention_heads": 8,
"hidden_size": 64
},
"store": "checkpoints/${dataset}/${network.num_hidden_layers}L${network.num_attention_heads}H/"
}

The value of train_path is dynamically constructed by referencing the value of base_path and train, and the value of store is dynamically constructed by referencing the value of dataset, network.num_hidden_layers, and network.num_attention_heads.

The above configuration dictionary can be processed as follows:

1
2
3
4
5
6
7
8
9
10
11
12
# solution 1
import smartdict
data = smartdict.parse(data)

# solution 2
from smartdict import DictCompiler
compiler = DictCompiler(data)
data = compiler.parse()

print(data['load']['base_path']) # => ~/data/spotify
print(data['load']['dev_path']) # => ~/data/spotify/dev
print(data['store']) # => checkpoints/spotify/3L8H/

Highly recommended to use smartdict along with Oba:

1
2
3
4
5
6
from oba import Obj

data = Obj(data)
print(data.load.base_path) # => ~/data/spotify
print(data.load.dev_path) # => ~/data/spotify/dev
print(data.store) # => checkpoints/spotify/3L8H/

Full Match Referencing ${}$

In the Normal String Referencing ${} solution, the value of the configuration dictionary is a string. The reference is performed through a chain path (separated by .) within the ${} identifier.

The Full Match Reference only allows the property value to be completely covered by the identifier ${}$. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import oba
import smartdict

data = dict(
a='${b.v.1}+1', # Normal String Referencing
b='${c}$', # Full Match Referencing
c=dict(
l=23,
v=('are', 'you', 'ok'),
)
)

data = smartdict.parse(data)
print(data['b']) # => {'l': 23, 'v': ('are', 'you', 'ok')}

data = oba.Obj(data)
print(data.a) # => you+1
print(data.b.l) # => 23

In which b is identical to c through the Full Match Reference.

Summon Magic

Sometimes, we may wish to generate the path through timestamps or random numbers. We can first construct the following two classes:

TimestampMagic

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import datetime


class TimestampMagic(dict):
def __init__(self):
dict.__init__(self, {})

def __contains__(self, item):
return True

def __getitem__(self, item):
now = datetime.datetime.now()
if item == 'str':
return now.strftime('%y%m%d-%H%M%S')
else:
return hex(int(now.timestamp()))[2:]

RandomMagic

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import random
import string


class RandomMagic(dict):
chars = string.ascii_letters + string.digits

def __init__(self):
dict.__init__(self, {})

def __contains__(self, item):
return True

def __getitem__(self, item):
return ''.join([random.choice(self.chars) for _ in range(int(item))])

Then, we can use the following configuration dictionary:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import smartdict

data = dict(
filename='${utils.time.str}/${utils.rand.4}.log', # Summon Magic, supported by smartdict>=0.0.4
)

data.update(dict(
utils=dict(
rand=Rand(),
time=Timing(),
)
))
data = smartdict.parse(data)


print(data['filename']) # => 20221110-123504/to1E.log

The principle behind this is to override the class’s [] operator to make it behave the same as a dictionary or list.

License

MIT

SmartDict: Dynamic pointing of values in Python dictionaries

https://liu.qijiong.work/2022/11/09/Develop-SmartDict/

Author

Qijiong LIU (Jyonn)

Posted on

2022-11-09

Updated on

2023-10-18

Licensed under

Comments