Python Reconstruct dictionary by splitting a formatted key string

The module that I used returned a dictionary but it didn’t have any nested objects. It looks like this below.

data = {
    "0\\price": 1.5,
    "0\\name": "Apple",
    "0\\stock": 50,
    "1\\price": 1.2,
    "1\\name": "Orange",
    "1\\stock": 22,
    "2\\price": 1100.4,
    "2\\name": "nice-pc",
    "2\\stock": 5,
}

There are 3 properties for each item. If we need to do something with each item, it’s hard to work with this structure.

The goal of this article is to convert it to the following structure.

{
    "0": {"price": 1.5,    "name": "Apple",   "stock": 50},
    "1": {"price": 1.2,    "name": "Orange",  "stock": 22},
    "2": {"price": 1100.4, "name": "nice-pc", "stock": 5},
}

Check how to get keys and values

There are some ways to get keys and values.

[print(f"{key}, {data[key]}") for key in data]
# 0\price, 1.5
# 0\name, Apple
# 0\stock, 50
# 1\price, 1.2
# 1\name, Orange
# 1\stock, 22
# 2\price, 1100.4
# 2\name, nice-pc
# 2\stock, 5
print(data.keys())
# dict_keys(['0\\price', '0\\name', '0\\stock', '1\\price', '1\\name', '1\\stock', '2\\price', '2\\name', '2\\stock'])
print(data.values())
# dict_values([1.5, 'Apple', 50, 1.2, 'Orange', 22, 1100.4, 'nice-pc', 5])
print(data.items())
# dict_items([('0\\price', 1.5), ('0\\name', 'Apple'), ('0\\stock', 50), ('1\\price', 1.2), ('1\\name', 'Orange'), ('1\\stock', 22), ('2\\price', 1100.4), ('2\\name', 'nice-pc'), ('2\\stock', 5)])

Dictionary has key-value pairs. So you might think the following code is also ok.

[print(f"{key}, {value}") for key, value in data]
# ValueError: too many values to unpack (expected 2)

But this raises an error.

To make it simple, using items() method is a good choice.

How to get index value in the key

The key looks like this "0\\price". This 0 is an index for one of the items (product). This can be extracted by using regex.

for key in data.keys():
    result = re.search("(\d+)", key)
    if result:
        print(f"{result}  -> {result.group(1)}")
    else:
        print("Not found")
# <re.Match object; span=(0, 1), match='0'>  -> 0
# <re.Match object; span=(0, 1), match='0'>  -> 0
# <re.Match object; span=(0, 1), match='0'>  -> 0
# <re.Match object; span=(0, 1), match='1'>  -> 1
# <re.Match object; span=(0, 1), match='1'>  -> 1
# <re.Match object; span=(0, 1), match='1'>  -> 1
# <re.Match object; span=(0, 1), match='2'>  -> 2
# <re.Match object; span=(0, 1), match='2'>  -> 2
# <re.Match object; span=(0, 1), match='2'>  -> 2

group(0) returns the entire match.
group(1) returns the first parenthesized subgroup.

The example above has only one pair of parentheses, so group(0) and group(1) returns the same value. Let’s check another example.

result = re.search("(\d+) (\d+ .+)", "111 222 333")
if result:
    print(result.group(0))  # 111 222 333
    print(result.group(1))  # 111
    print(result.group(2))  # 222 333

group(0) return the entire match of the regex.
group(1) returns the first parenthesized subgroup match.

Extracting the index and actual key for the index

Let’s see the key-value again.

data = {
    "0\\price": 1.5,
    "0\\name": "Apple",
    ...
}

We need to extract 0 as the index and price as the key in the object. We need to update the regex.

for key in data.keys():
    result = re.search("(\d+)\\\\(.+)", key)
    if result:
        print(f"key: {result.group(1)}, value: {result.group(2)}")
    else:
        print("Not found")
# key: 0, value: price
# key: 0, value: name
# key: 0, value: stock
# key: 1, value: price
# key: 1, value: name
# key: 1, value: stock
# key: 2, value: price
# key: 2, value: name
# key: 2, value: stock

Two backslashes are in the key string. To use a backslash in the regex, a single backslash needs to be used before the actual bash slash. With this regex, we can get the index by group(1) and the actual key by group(2).

Recreating the dictionary object

The last step is to set the corresponding value to the key-value. We already have the index and the actual key.

A new dictionary object needs to be created when the index changes. It is done on line 10.

restructure = {}
last_base_key = None
for key, value in data.items():
    result = re.search("(\d+)\\\\(.+)", key)
    if not result:
        raise Exception("Unexpected format")

    index = result.group(1)
    key = result.group(2)
    if index != last_base_key:
        restructure[index] = {}
    last_base_key = index
    restructure[index][key] = value
# {'0': {'price': 1.5, 'name': 'Apple', 'stock': 50}, '1': {'price': 1.2, 'name': 'Orange', 'stock': 22}, '2': {'price': 1100.4, 'name': 'nice-pc', 'stock': 5}}

Yes, we could restructure the dict object to make the post-process easier.

Generalizing the logic

The key depends on the author. Let’s generalize the logic to be able to use it for various key formats.

def restructure_dict(dict, reg_str):
    result = {}
    last_base_key = None
    for key, value in data.items():
        search_result = re.search(reg_str, key)
        if not search_result:
            raise Exception("Unexpected format found: {key}")

        index = search_result.group(1)
        key = search_result.group(2)
        if index != last_base_key:
            result[index] = {}
        last_base_key = index
        result[index][key] = value
    return result


data = {
    "apple:prop1": "1-1",
    "apple:prop2": "1-2",
    "apple:prop3": "1-3",
    "honey:prop1": "2-1",
    "honey:prop2": "2-2",
    "honey:prop3": "2-3",
    "juice:prop1": "3-1",
    "juice:prop2": "3-2",
    "juice:prop3": "3-3",
}
print(restructure_dict(data, "(.+):(.+)"))
# {
#   'apple': {'prop1': '1-1', 'prop2': '1-2', 'prop3': '1-3'}, 
#   'honey': {'prop1': '2-1', 'prop2': '2-2', 'prop3': '2-3'}, 
#   'juice': {'prop1': '3-1', 'prop2': '3-2', 'prop3': '3-3'}
# }

Since we don’t know about the format of the key, it must be specified on a parameter. The regex string must have two parentheses.