tutorial
Demystifying Python JSON, dictionaries and lists
"Life is like an onion, you peel off layer by layer and sometimes you cry" ―Karl Sandberg
I assume the same goes for extracting values from nested JSON structures. Even the most experienced programmer can be brought to tears when working with a JSON object composed of a mixture of deeply nested data structures. The process of extracting the values can feel chaotic and disorganized at best. The more data there is, the bigger the mess becomes.
In this tutorial, I will guide you step by step method to extract needed values from JSON.A word of warning: This tutorial is not intended for JSON newbies, lists or dictionaries. If you've never heard of a list index or dictionary key-value pair, I would suggest reading one of the many great tutorials available on thenetworkorYoutube. Once you feel more comfortable with the topic, come back to continue learning and growing.
JSON vs Lists vs Dictionaries
First things first, when it comes to the terms "JSON", "list" and "dictionary", we need to get some important things done. JSON or JavaScript Object Notation is a broader format used to encapsulate dictionary and list structures as shown in the image below.
TheTechnical Documentationstates that a JSON object is built on two structures: a list of key-value pairs and an ordered list of values. InPython programming, are key-value pairsdictionaryObjects and ordered list areListobjects. In practice it isstarting pointfor nested data extraction starts with either a dictionary or a list data structure. When extracting nested data, the questions should be: Is the data nested in a directory or list data structure? What combination of data structures is used? Is the first data structure used a dictionary or a list?
"It has long been a principle of mine that the little things matter infinitely." - Sir Arthur Conan Doyle,
If it looks like I'm going to make onebig thingAs for the terminology, it's because it's me. When extracting nested data, it's the details that count. data structureschangeThe deeper the data is nested in the JSON structure and the more these differences are knownimportant. The initial data structure can be a list, but then change to a dictionary as the data is extracted. The key to extracting data from a JSON object is recognizing the mix of data structures used to store the data. If youKampfIn order to discover the data structure in a JSON object, you will probably have trouble extracting the values you want. In most cases, this leads to the application of theincorrectextraction technique.
The following table is a quick refresher on the techniques used to extract data from a JSON structure.
One last notebefore we start our example. In Python programming, the term "data structure" is rarely used when dealing with lists and dictionaries. The common term is "data type". I use the terms data type and data structure interchangeably in this tutorial. I use the term data structure because it conveys the idea that the data structures are the basic building blocks of the JSON object. Python's use of the term data type is no less important, but it doesn't convey the same meaning as a key to understanding nested data extraction.
Let's start
One of the best ways to learn is by manipulating real data with a mix of list and dictionary data structures. In this tutorial we use real data from theREST Countries API. This API returns about 250 records with a mix of dictionaries, lists, and other data types. Our goal is to extract those'AFN'
Value from the dictionary key-value pair'code':'AFN'
as shown in the picture below. The'AFN'
is nested in two list structures and one dictionary structure.
The sample code
Click on itshortcutgives you access to the example code in the following examples. The link takes you to a course I developed on extracting nested JSON data. The course has helped hundreds of students learn how to extract nested data. You don't need to purchase the course to get the files. The filenames are single_json.py and multiple_json.py.
Extract individual elements
In this example, we'll start extracting data using a combination of list and dictionary extraction techniques, as shown in the preceding table. In the following Python code, I start by deploying theLogicto import the data thatSolutionand then theworkflowto derive the solution. I recommend following all the steps as shown below. The workflow steps are explained below the Python code.
Python-Code:
Workflow steps:
Step 1: Import requests
: This line imports the Requests HTTP library for Python. It's the library we'll use to connect to a Restful API. If you haven't already installed it, you can install it from the command prompt or the virtual environment usingPip-InstallationsanfragenCommand.Step 2: url = 'https://restcountries.eu/rest/v2/all'
This line stores the web address for the REST API. The address is saved in theURL
Variable.Step 3: Response = Requests.get(url)
:This method is used to connect to the Restful API.https://restcountries.eu/rest/v2/allto extract the data. The returned data is stored inAnswer
Variable. In technical jargon, this is referred to as a response object.Step 4: Storage = response.json()
returns a JSON object of the result (if the result was written in JSON format, otherwise an error is thrown). Think of them.json()
as a storage format for exchanging data. In this case, we store the content in thestorage
Variable.Step 5: Print (Type (Storage))
:this returns the Python data type used to store the data. In this case, the returned data type is a list (<class 'List'>
). Looking back at the table I provided earlier, the data can be extracted using the list index [0,...]. You shouldalwaysuse theTyp()
Function for determining the data type. If you know the data type, you know the right extraction technique.Step 6: print(len(storage))
:This returns the number of items in the list. Each number represents the index of an item in the list, and the index can be used to extract the value.Step 7: print(storage[0])
:the 0 represents the first item in the listANDis used to extract the item from the list. Once the item has been extracted, aneuData type is now available. TheTyp()
Function in step 8 is used to indicate the data type.Step 8: print(type(storage[0]))
:the new data type will be<class 'dict'>
. The dictionary means that the data can be extracted using a key. In our data, the key used to extract the value iscurrencies
. Thecurrencies
button is used in step 9. Takenote, the data type has changed from<class 'List'>
in step 5 to<class 'dict'>
in step 8.Step 9: Storage[0]['Currencies']
: Diecurrencies
Key in dictionary is used for output[{code:'AFN,'name':'Afgahn afghani','symbol':''}]
. Once the item has been extracted, anew data typeis now suspended. TheTyp()
Function in step 10 is used to indicate the data type.Schritt 10: print(type(storage[0]['currencies']))
:the new data type will be<class 'list'
. The list data type means we use the index operator[]
to extract the next set of values. In this case, the index value is 0, so we use it in step 11. Notice that the data type has changed from<class 'dict'>
in step 8 to<class 'List'>
in step 10.Step 11: print(storage[0]['currencies'][0])
: The index[0]
is used for output{code:'AFN,'name':'Afgahn afghani','symbol':''}
. Once the item has been extracted, anew data typeis now suspended. TheTyp()
Function in step 12 is used to indicate the data type.Schritt 12: print(type(storage[0]['currencies'][0]))
: The new data type is<class 'dict'>
. The dictionary means that the data can be extracted using a key. In our data, the key used to extract the value isCode
. TheCode
key is used in step 13Schritt 13: print(type(storage[0]['currencies'][0]['code']))
:TheCode
button is used to output theAFN
That's the value we want to spend.
So let's summarizetwo stepswhich are repeated until we arrive at the value we want to extract. Thefirst stepis to determine the data type and thesecond stepthe extraction method should be used. If the data type is a list, use the subscript operator with square brackets. However, if the data type is a dictionary, use the dictionary key with curly braces.
Extract multiple items
While extracting a single list item from a JSON structure is an important first step, extracting just a single value is not common. In real data, values are stored in JSON objects as collections. In the picture below is thecurrencies
AndCode
Dictionary keys and values have multiple entries in the list data structure. Our real-world example has 250 of these entries, so our goal in this section of the tutorial is to extract these and the remaining values.
Fortunately, we can extract these values by building on the workflow steps we used to extract a single value from the JSON structure. I will not list these steps again. Since the list and dictionary data structures are iterable, we can use afor loop
Structure to traverse our values. So let's add that to our code in step 14. We need to use those tooArea
Andlen
Functions to enumerate the number of elements in the list. The workflow steps are explained below the Python code.
Python-Code:
Workflow steps:
Step 14:for items in reach (Len(Storage)):
Thestorage
Variable contains the list<class 'List'>
returned in step 4len
The function is used to count the number of items in the list. TheArea
Function generates a sequence of numbers based on the number of items in the list. The number is passed to theArticle
variable each time thefor loop
iterates through thestorage
List. In line 48, theprint(storage[item]['currencies'][0]['code'])
shown in step 14 is from step 13. We've included it infor loop
Structure and modifies the codememory[0]
→Storage[Article]
. Thememory[0]
refers to a single item in the list. However, theStorage[Article]
captures several elements in thestorage
List. Every time thefor loop
iterates thatStorage[Article]
is incremented to capture the next set of values.
The process of finding nested data can feel daunting, cumbersome, and annoying at times. It does not readily lend itself to introductory techniques such as thorough iterationdict.items(), dict.keys(), dict.values() oder list indexes[]
. The complexity of the JSON structure inevitably requires switching between multiple dictionaries and list extraction techniques to extract the data. Getting the hang of it takes a repeatable process, but most importantly, practice. I have included several additional practice problems with solutions (Click here).
If you don't practice, someone else will do better - Allen Iverson
(Video) Convert nested dictionary to pandas DataFrame
Much luck.