What are data structures in Python programming?

Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore popular packages.

See DetailsRight Arrow

Start course

Python Data Science Toolbox (Part 2)

Beginner

4 hr

230.4K

Continue to build your modern Data Science skills by learning about iterators and list comprehensions.

See DetailsRight Arrow

Start course

Python Data Science Toolbox (Part 1)

Beginner

3 hr

351.6K

Learn the art of writing your own functions in Python, as well as key concepts like scoping and error handling.

Data structures are fundamental concepts of computer science which helps is writing efficient programs in any language. Python is a high-level, interpreted, interactive and object-oriented scripting language using which we can study the fundamentals of data structure in a simpler way as compared to other programming languages.

In this chapter we are going to study a short overview of some frequently used data structures in general and how they are related to some specific python data types. There are also some data structures specific to python which is listed as another category.

General Data Structures

The various data structures in computer science are divided broadly into two categories shown below. We will discuss about each of the below data structures in detail in subsequent chapters.

Liner Data Structures

These are the data structures which store the data elements in a sequential manner.

  • Array − It is a sequential arrangement of data elements paired with the index of the data element.

  • Linked List − Each data element contains a link to another element along with the data present in it.

  • Stack − It is a data structure which follows only to specific order of operation. LIFO(last in First Out) or FILO(First in Last Out).

  • Queue − It is similar to Stack but the order of operation is only FIFO(First In First Out).

  • Matrix − It is two dimensional data structure in which the data element is referred by a pair of indices.

Non-Liner Data Structures

These are the data structures in which there is no sequential linking of data elements. Any pair or group of data elements can be linked to each other and can be accessed without a strict sequence.

  • Binary Tree − It is a data structure where each data element can be connected to maximum two other data elements and it starts with a root node.

  • Heap − It is a special case of Tree data structure where the data in the parent node is either strictly greater than/ equal to the child nodes or strictly less than it’s child nodes.

  • Hash Table − It is a data structure which is made of arrays associated with each other using a hash function. It retrieves values using keys rather than index from a data element.

  • Graph − It is an arrangement of vertices and nodes where some of the nodes are connected to each other through links.

Python Specific Data Structures

These data structures are specific to python language and they give greater flexibility in storing different types of data and faster processing in python environment.

  • List − It is similar to array with the exception that the data elements can be of different data types. You can have both numeric and string data in a python list.

  • Tuple − Tuples are similar to lists but they are immutable which means the values in a tuple cannot be modified they can only be read.

  • Dictionary − The dictionary contains Key-value pairs as its data elements.

In the next chapters we are going to learn the details of how each of these data structures can be implemented using Python.

After reading this tutorial, you’ll learn what data structures exist in Python, when to apply them, and their pros and cons. We’ll talk about data structures in general, then dive deeper into Python data structures: lists, dictionaries, sets, and tuples.

What Is a Data Structure?

A data structure is a way of organizing data in computer memory, implemented in a programming language. This organization is required for efficient storage, retrieval, and modification of data. It is a fundamental concept as data structures are one of the main building blocks of any modern software. Learning what data structures exist and how to use them efficiently in different situations is one of the first steps toward learning any programming language.

Data Structures in Python

Built-in data structures in Python can be divided into two broad categories: mutable and immutable. Mutable (from Latin mutabilis, "changeable") data structures are those which we can modify — for example, by adding, removing, or changing their elements. Python has three mutable data structures: lists, dictionaries, and sets. Immutable data structures, on the other hand, are those that we cannot modify after their creation. The only basic built-in immutable data structure in Python is a tuple.

Python also has some advanced data structures, such as or , which can be implemented with basic data structures. However, these are rarely used in data science and are more common in the field of software engineering and implementation of complex algorithms, so we won’t discuss them in this tutorial.

Different Python third-party packages implement their own data structures, like DataFrames and in

    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
8 or arrays in
    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
9. However, we will not talk about them here, either, because these are the topics of more specific tutorials (such as How to Create and Use a Pandas DataFrame or NumPy Tutorial: Data Analysis with Python).

Let’s start with the mutable data structures: lists, dictionaries, and sets.

Lists

Lists in Python are implemented as dynamic mutable arrays which hold an ordered collection of items.

First, in many programming languages, arrays are data structures that contain a collection of elements of the same data types (for instance, all elements are integers). However, in Python, lists can contain heterogeneous data types and objects. For instance, integers, strings, and even functions can be stored within the same list. Different elements of a list can be accessed by integer indices where the first element of a list has the index of 0. This property derives from the fact that in Python, lists are ordered, which means they retain the order in which you insert the elements into the list.

Next, we can arbitrarily add, remove, and change elements in the list. For instance, the

# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
0 method adds a new element to a list, and the
# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
1 method removes an element from a list. Furthermore, by accessing a list’s element by index, we can change it to another element. For more detail on different list methods, please refer to the .

Finally, when creating a list, we do not have to specify in advance the number of elements it will contain; therefore, it can be expanded as we wish, making it dynamic.

Lists are useful when we want to store a collection of different data types and subsequently add, remove, or perform operations on each element of the list (by looping through them). Furthermore, lists are useful to store other data structures (and even other lists) by creating, for instance, lists of dictionaries, tuples, or lists. It is very common to store a table as a list of lists (where each inner list represents a table’s column) for subsequent data analysis.

Thus, the pros of lists are:

  • They represent the easiest way to store a collection of related objects.
  • They are easy to modify by removing, adding, and changing elements.
  • They are useful for creating nested data structures, such as a list of lists/dictionaries.

However, they also have cons:

  • They can be pretty slow when performing arithmetic operations on their elements. (For speed, use NumPy’s arrays.)
  • They use more disk space because of their under-the-hood implementation.

Examples

Finally, let’s have a look at a few examples.

We can create a list using either square brackets (

# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
2) with zero or more elements between them, separated by commas, or the . The latter can also be used to transform certain other data structures into lists.

# Create an empty list using square brackets
l1 = []

# Create a four-element list using square brackets
l2 = [1, 2, "3", 4]  # Note that this lists contains two different data types: integers and strings

# Create an empty list using the list() constructor
l3 = list()

# Create a three-element list from a tuple using the list() constructor
# We will talk about tuples later in the tutorial
l4 = list((1, 2, 3))

# Print out lists
print(f"List l1: {l1}")
print(f"List l2: {l2}")
print(f"List l3: {l3}")
print(f"List l4: {l4}")
List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]

We can access a list’s elements using indices, where the first element of a list has the index of 0:

# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
    The first element of the list l2 is 1.

    The third element of the list l4 is 3.

We can also slice lists and access multiple elements simultaneously:

# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
    ['3', 4]

Note that we did not have to specify the index of the last element we wanted to access if we wanted all elements from index 2 (included) to the end of the list. Generally speaking, the list slicing works as follows:

  1. Open square brackets.
  2. Write the first index of the first element we want to access. This element will be included in the output. Place the colon after this index.
  3. Write the index, plus one of the last elements we want to access. The addition of 1 here is required because the element under the index we write will not be included in the output.

Let’s show this behavior with an example:

print(f"List l2: {l2}")

# Access the second and the third elements of list l2 (these are the indices 1 and 2)
print(f"Second and third elements of list l2: {l2[1:3]}")
    List l2: [1, 2, '3', 4]
    Second and third elements of list l2: [2, '3']

Note that the last index we specified is 3, not 2, even though we wanted to access the element under index 2. Thus, the last index we write is not included.

You can experiment with different indices and bigger lists to understand how indexing works.

Now let’s demonstrate that lists are mutable. For example, we can

# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
4 a new element to a list or
# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
5 a specific element from it:

# Append a new element to the list l1
l1.append(5)

# Print the modified list
print("Appended 5 to the list l1:")
print(l1)

# Remove element 5 from the list l1
l1.remove(5)

# Print the modified list
print("Removed element 5 from the list l1:")
print(l1)
    Appended 5 to list l1:
    [5]
    Removed element 5 from the list l1:
    []

Additionally, we can modify the elements that are already in the list by accessing the required index and assigning a new value to that index:

List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
0
List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
1

Of course, we have only scratched the surface of what is possible with Python lists. You can learn more from this course or have a look at .

Dictionaries

Dictionaries in Python are very similar to real-world dictionaries. These are mutable data structures that contain a collection of keys and, associated with them, values. This structure makes them very similar to word-definition dictionaries. For example, the word dictionary (our key) is associated with its definition (value) in Oxford online dictionary: a book or electronic resource that gives a list of the words of a language in alphabetical order and explains what they mean, or gives a word for them in a foreign language.

Dictionaries are used to quickly access certain data associated with a unique key. Uniqueness is essential, as we need to access only certain pieces of information and not confuse it with other entries. Imagine we want to read the definition of Data Science, but a dictionary redirects us to two different pages: which one is the correct one? Note that technically we can create a dictionary with two or more identical keys, although due to the nature of dictionaries, it is not advisable.

List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
2

We use dictionaries when we are able to associate (in technical terms, to map) a unique key to certain data, and we want to access that data very quickly (in constant time, no matter the dictionary size). Moreover, dictionary values can be pretty complex. For example, our keys can be customer names, and their personal data (values) can be dictionaries with the keys like "Age," "Hometown," etc.

Thus, the pros of dictionaries are:

  • They make code much easier to read if we need to generate
    # Assign the third and the fourth elements of l2 to a new list
    l5 = l2[2:]
    
    # Print out the resulting list
    print(l5)
    6 pairs. We can also do the same with a list of lists (where inner lists are pairs of "keys" and "values"), but this looks more complex and confusing.
  • We can look up a certain value in a dictionary very quickly. Instead, with a list, we would have to read the list before we hit the required element. This difference grows drastically if we increase the number of elements.

However, their cons are:

  • They occupy a lot of space. If we need to handle a large amount of data, this is not the most suitable data structure.
  • In Python 3.6.0 and later versions, dictionaries . Keep that in mind to avoid compatibility issues when using the same code in different versions of Python.

Examples

Let’s now take a look at a few examples. First, we can create a dictionary with curly brackets (

# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
7) or the
# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
8 constructor:

List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
3
List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
4

Now let’s access an element in a dictionary. We can do this with the same method as lists:

List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
5
List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
6

Next, we can also modify dictionaries — for example, by adding new

# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
6 pairs:

List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
7
List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
8

As we can see, a new key, "Violet", has been added.

It’s also possible to remove elements from a dictionary, so look for a way to do so by reading the . Furthermore, you can read a more in-depth tutorial on Python dictionaries (with tons of examples) or have a look at DataQuest’s dictionary lesson.

Sets

Sets in Python can be defined as mutable dynamic collections of immutable unique elements. The elements contained in a set must be immutable. Sets may seem very similar to lists, but in reality, they are very different.

First, they may only contain unique elements, so no duplicates are allowed. Thus, sets can be used to remove duplicates from a list. Next, like sets in mathematics, they have unique operations which can be applied to them, such as set union, intersection, etc. Finally, they are very efficient in checking whether a specific element is contained in a set.

Thus, the pros of sets are:

  • We can perform unique (but similar) operations on them.
  • They are significantly faster than lists if we want to check whether a certain element is contained in a set.

But their cons are:

  • Sets are intrinsically unordered. If we care about keeping the insertion order, they are not our best choice.
  • We cannot change set elements by indexing as we can with lists.

Examples

To create a set, we can use either curly brackets (

# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
7) or the
    ['3', 4]
1 constructor. Do not confuse sets with dictionaries (which also use curly brackets), as sets do not contain
# Assign the third and the fourth elements of l2 to a new list
l5 = l2[2:]

# Print out the resulting list
print(l5)
6 pairs. Note, though, that like with dictionary keys, only immutable data structures or types are allowed as set elements. This time, let’s directly create populated sets:

List l1: []
List l2: [1, 2, '3', 4]
List l3: []
List l4: [1, 2, 3]
9
# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
0

In the second example, we used an iterable (such as a list) to create a set. However, if we used lists as set elements, Python would throw an error. Why do you think it happens? Tip: read the definition of sets.

To practice, you can try using other data structures to create a set.

As with their math counterparts, we can perform certain operations on our sets. For example, we can create a union of sets, which basically means merging two sets together. However, if two sets have two or more identical values, the resulting set will contain only one of these values. There are two ways to create a union: either with the

    ['3', 4]
3 method or with the vertical bar (
    ['3', 4]
4) operator. Let’s make an example:

# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
1
# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
2

In the above union, we can see that

    ['3', 4]
5 and
    ['3', 4]
6 appear only once, even though we merged two sets.

Next, we may also want to find out which names appear in both sets. This can be done with the

    ['3', 4]
7 method or the ampersand (
    ['3', 4]
8) operator.

# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
3
# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
4

    ['3', 4]
6 and
    ['3', 4]
5 appear in both sets; thus, they are returned by the set intersection.

The last example of set operations is the difference between two sets. In other words, this operation will return all the elements that are present in the first set, but not in the second one. We can use either the

print(f"List l2: {l2}")

# Access the second and the third elements of list l2 (these are the indices 1 and 2)
print(f"Second and third elements of list l2: {l2[1:3]}")
1 method or the minus sign (
print(f"List l2: {l2}")

# Access the second and the third elements of list l2 (these are the indices 1 and 2)
print(f"Second and third elements of list l2: {l2[1:3]}")
2):

# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
5
# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
6

What would happen if you swapped the positions of the sets? Try to predict the result before the attempt.

There are other operations that can be used in sets. For more information, refer to this tutorial, or .

Finally, as a bonus, let’s compare how fast using sets is, when compared to lists, for checking the existence of an element within them.

# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
7
# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
8

It is obvious that using sets is considerably faster than using lists. This difference will increase for larger sets and lists.

Tuples

Tuples are almost identical to lists, so they contain an ordered collection of elements, except for one property: they are immutable. We would use tuples if we needed a data structure that, once created, cannot be modified anymore. Furthermore, tuples can be used as dictionary keys if all the elements are immutable.

Other than that, tuples have the same properties as lists. To create a tuple, we can either use round brackets (

print(f"List l2: {l2}")

# Access the second and the third elements of list l2 (these are the indices 1 and 2)
print(f"Second and third elements of list l2: {l2[1:3]}")
3) or the
print(f"List l2: {l2}")

# Access the second and the third elements of list l2 (these are the indices 1 and 2)
print(f"Second and third elements of list l2: {l2[1:3]}")
4 constructor. We can easily transform lists into tuples and vice versa (recall that we created the list
print(f"List l2: {l2}")

# Access the second and the third elements of list l2 (these are the indices 1 and 2)
print(f"Second and third elements of list l2: {l2[1:3]}")
5 from a tuple).

The pros of tuples are:

  • They are immutable, so once created, we can be sure that we won’t change their contents by mistake.
  • They can be used as dictionary keys if all their elements are immutable.

The cons of tuples are:

  • We cannot use them when we have to work with modifiable objects; we have to resort to lists instead.
  • Tuples cannot be copied.
  • They occupy more memory than lists.

Examples

Let’s take a look at some examples:

# Print out the first element of list l2
print(f"The first element of the list l2 is {l2[0]}.")
print()

# Print out the third element of list l4
print(f"The third element of the list l4 is {l4[2]}.")
9
    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
0

Is it possible to create tuples from other data structures (i.e., sets or dictionaries)? Try it for practice.

Tuples are immutable; thus, we cannot change their elements once they are created. Let’s see what happens if we try to do so:

    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
1
    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
2

It is a

print(f"List l2: {l2}")

# Access the second and the third elements of list l2 (these are the indices 1 and 2)
print(f"Second and third elements of list l2: {l2[1:3]}")
6! Tuples do not support item assignments because they are immutable. To solve this problem, we can convert this tuple into a list.

However, we can access elements in a tuple by their indices, like in lists:

    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
3
    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
4

Tuples can also be used as dictionary keys. For example, we may store certain elements and their consecutive indices in a tuple and assign values to them:

    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
5

If you use a tuple as a dictionary key, then the tuple must contain immutable objects:

    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
6
    The first element of the list l2 is 1.

    The third element of the list l4 is 3.
7

We get a

print(f"List l2: {l2}")

# Access the second and the third elements of list l2 (these are the indices 1 and 2)
print(f"Second and third elements of list l2: {l2[1:3]}")
6 if our tuples/keys contain mutable objects (lists in this case).

Conclusions

Let’s wrap up what we have learned from this tutorial:

  • Data structure is a fundamental concept in programming, which is required for easily storing and retrieving data.
  • Python has four main data structures split between mutable (lists, dictionaries, and sets) and immutable (tuples) types.
  • Lists are useful to hold a heterogeneous collection of related objects.
  • We need dictionaries whenever we need to link a key to a value and to quickly access some data by a key, like in a real-world dictionary.
  • Sets allow us to perform operations, such as intersection or difference, on them; thus, they are useful for comparing two sets of data.
  • Tuples are similar to lists, but are immutable; they can be used as data containers that we do not want to modify by mistake.

Data SciencepythonTutorials

What are data structures in Python programming?

About the author

Artur Sannikov

I am a Molecular Biology student at the University of Padua, Italy interested in bioinformatics and data analysis.

How many types of data structures are there in Python?

Python has four main data structures split between mutable (lists, dictionaries, and sets) and immutable (tuples) types.

What are data structures in programming?

What is a Data Structure? Data structures are methods of storing and organizing data in a computer system so that operations can be performed upon them more efficiently. When data is “unstructured,” it does not have a defined data model or is not organized in a manner that is conducive to operations or analysis.

What are data structures?

A data structure is a specialized format for organizing, processing, retrieving and storing data. There are several basic and advanced types of data structures, all designed to arrange data to suit a specific purpose. Data structures make it easy for users to access and work with the data they need in appropriate ways.