The module that I used returned a dictionary but it didn’t have any nested objects. It looks like this below.
data = {
"0\\price": 1.5,
"0\\name": "Apple",
"0\\stock": 50,
"1\\price": 1.2,
"1\\name": "Orange",
"1\\stock": 22,
"2\\price": 1100.4,
"2\\name": "nice-pc",
"2\\stock": 5,
}
There are 3 properties for each item. If we need to do something with each item, it’s hard to work with this structure.
The goal of this article is to convert it to the following structure.
{
"0": {"price": 1.5, "name": "Apple", "stock": 50},
"1": {"price": 1.2, "name": "Orange", "stock": 22},
"2": {"price": 1100.4, "name": "nice-pc", "stock": 5},
}
Check how to get keys and values
There are some ways to get keys and values.
[print(f"{key}, {data[key]}") for key in data]
# 0\price, 1.5
# 0\name, Apple
# 0\stock, 50
# 1\price, 1.2
# 1\name, Orange
# 1\stock, 22
# 2\price, 1100.4
# 2\name, nice-pc
# 2\stock, 5
print(data.keys())
# dict_keys(['0\\price', '0\\name', '0\\stock', '1\\price', '1\\name', '1\\stock', '2\\price', '2\\name', '2\\stock'])
print(data.values())
# dict_values([1.5, 'Apple', 50, 1.2, 'Orange', 22, 1100.4, 'nice-pc', 5])
print(data.items())
# dict_items([('0\\price', 1.5), ('0\\name', 'Apple'), ('0\\stock', 50), ('1\\price', 1.2), ('1\\name', 'Orange'), ('1\\stock', 22), ('2\\price', 1100.4), ('2\\name', 'nice-pc'), ('2\\stock', 5)])
Dictionary has key-value pairs. So you might think the following code is also ok.
[print(f"{key}, {value}") for key, value in data]
# ValueError: too many values to unpack (expected 2)
But this raises an error.
To make it simple, using items() method is a good choice.
How to get index value in the key
The key looks like this "0\\price"
. This 0 is an index for one of the items (product). This can be extracted by using regex.
for key in data.keys():
result = re.search("(\d+)", key)
if result:
print(f"{result} -> {result.group(1)}")
else:
print("Not found")
# <re.Match object; span=(0, 1), match='0'> -> 0
# <re.Match object; span=(0, 1), match='0'> -> 0
# <re.Match object; span=(0, 1), match='0'> -> 0
# <re.Match object; span=(0, 1), match='1'> -> 1
# <re.Match object; span=(0, 1), match='1'> -> 1
# <re.Match object; span=(0, 1), match='1'> -> 1
# <re.Match object; span=(0, 1), match='2'> -> 2
# <re.Match object; span=(0, 1), match='2'> -> 2
# <re.Match object; span=(0, 1), match='2'> -> 2
group(0) returns the entire match.
group(1) returns the first parenthesized subgroup.
The example above has only one pair of parentheses, so group(0)
and group(1)
returns the same value. Let’s check another example.
result = re.search("(\d+) (\d+ .+)", "111 222 333")
if result:
print(result.group(0)) # 111 222 333
print(result.group(1)) # 111
print(result.group(2)) # 222 333
group(0)
return the entire match of the regex.group(1)
returns the first parenthesized subgroup match.
Extracting the index and actual key for the index
Let’s see the key-value again.
data = {
"0\\price": 1.5,
"0\\name": "Apple",
...
}
We need to extract 0 as the index and price as the key in the object. We need to update the regex.
for key in data.keys():
result = re.search("(\d+)\\\\(.+)", key)
if result:
print(f"key: {result.group(1)}, value: {result.group(2)}")
else:
print("Not found")
# key: 0, value: price
# key: 0, value: name
# key: 0, value: stock
# key: 1, value: price
# key: 1, value: name
# key: 1, value: stock
# key: 2, value: price
# key: 2, value: name
# key: 2, value: stock
Two backslashes are in the key string. To use a backslash in the regex, a single backslash needs to be used before the actual bash slash. With this regex, we can get the index by group(1)
and the actual key by group(2)
.
Recreating the dictionary object
The last step is to set the corresponding value to the key-value. We already have the index and the actual key.
A new dictionary object needs to be created when the index changes. It is done on line 10.
restructure = {}
last_base_key = None
for key, value in data.items():
result = re.search("(\d+)\\\\(.+)", key)
if not result:
raise Exception("Unexpected format")
index = result.group(1)
key = result.group(2)
if index != last_base_key:
restructure[index] = {}
last_base_key = index
restructure[index][key] = value
# {'0': {'price': 1.5, 'name': 'Apple', 'stock': 50}, '1': {'price': 1.2, 'name': 'Orange', 'stock': 22}, '2': {'price': 1100.4, 'name': 'nice-pc', 'stock': 5}}
Yes, we could restructure the dict object to make the post-process easier.
Generalizing the logic
The key depends on the author. Let’s generalize the logic to be able to use it for various key formats.
def restructure_dict(dict, reg_str):
result = {}
last_base_key = None
for key, value in data.items():
search_result = re.search(reg_str, key)
if not search_result:
raise Exception("Unexpected format found: {key}")
index = search_result.group(1)
key = search_result.group(2)
if index != last_base_key:
result[index] = {}
last_base_key = index
result[index][key] = value
return result
data = {
"apple:prop1": "1-1",
"apple:prop2": "1-2",
"apple:prop3": "1-3",
"honey:prop1": "2-1",
"honey:prop2": "2-2",
"honey:prop3": "2-3",
"juice:prop1": "3-1",
"juice:prop2": "3-2",
"juice:prop3": "3-3",
}
print(restructure_dict(data, "(.+):(.+)"))
# {
# 'apple': {'prop1': '1-1', 'prop2': '1-2', 'prop3': '1-3'},
# 'honey': {'prop1': '2-1', 'prop2': '2-2', 'prop3': '2-3'},
# 'juice': {'prop1': '3-1', 'prop2': '3-2', 'prop3': '3-3'}
# }
Since we don’t know about the format of the key, it must be specified on a parameter. The regex string must have two parentheses.
Comments