Demystified: Tuple vs. Dict
The real distinction between tuples and dicts, and how it relates to Dependency Injection, Joining of SQL Tables and more…
Technically, tuples and dicts are equivalent in terms of content: Tuples select the feature by positional index, while dicts select by a unique identifier called key.
Note: In this article, I prefer to use tuple and dict as in Python, since there is no universal name for both. If you come from a Go background, it’s struct and map respectively.
Table of Contents
1.1: Information is Stored as Tuple by Default
1.2: Tuples Can Be Compared Among Themselves
1.3: Tuples Cannot Interact With Other Types of Tuples
2.1: Dependency Injection: What, How, Why?
2.2: Mocking Using Dependency Injection
2.3: How Dependency Injection Relates to Tuples vs. Dicts
3: Step-by-Step Construction of Dicts in Maths: Multiset, Ordered Set, Abstract Tuple, Dict
4.1: Dependency Injection on Person-Animal Comparison
4.2: How Merging Two Tuples Would Work Using Dicts
4.3: How Tuple vs. Dict Relationship Is Related to Joining Datasets
Information is Stored as Tuple by Default
Unlike dicts, the features of tuples are strictly defined, and also their domains are known, which is why tuples are much safer to use.
Before looking up a feature from a tuple, you need to first make sure that the tuple contains this feature and the position of feature is known.
Tuples Can Be Compared Among Themselves
With this in mind, tuples of same kind can be compared and sorted based on their features. For example:
- Let’s define a person signature as follows: P_i = (a_i, w_i, h_i) where a_i is “age”, w_i is “weight” and h_i is “height”.
- f is a function which stands for “isYounger”: f(P_1, P_2) = a_1 < a_2.
- g is a function which stands for ”isTallerOrHeavier”: g(P_1, P_2) = h_1 > h_2 ∨ w_1 > w_2.
- Two instances are given: P_1 = (18, 80, 1.79) and P_2 = (21, 84, 1.85). f(P_1, P_2) outputs “YES” and g(P_1, P_2) outputs “NO”.
- Selecting a comparison function of our wish as the ranking criteria among person tuples, we can sort persons.
Tuples Cannot Interact With Other Types of Tuples
But what if we wanted to compare animals too?
- Let’s say animals have a different signature: A_j = (z_j, h_j, w_j, a_j) where z_j is an additional feature standing for the “genus of an animal”.
- We would have to define P2P (“person-to-person”), P2A, A2P and A2A comparing functions.
- For example: l(P_1 = (a_1, w_1, h_1), A_2 = (z_2, h_2, w_2, a_2)) = w_1 < w_2 for compares the weights of a person and an animal.
- Even without an additional feature z_i, the comparison function would be a mess, since the order of features play a key role.
So it’s pretty clear that tuples are not capable of interacting with other types of tuples.
As we’ll later see, this problem can be solved using clever tricks, such as casting the tuples to dicts and performing the comparison. But first, we have to gain an understanding of a very powerful technique called dependency injection.
Dependency Injection: What, How, Why?
Given:
class Person { int age; int weight; int height; }
class Animal { string genus; int height; int weight; int age; }
bool compare (Person either, Person other){ return either.age < other.age; }
which defines the pairwise comparison criteria of people.
Task:
- Extend the code so that we can compare humans with animals too.
Solution:
- Extract a common comparable interface for persons and animals:
interface AgeComparable { int age(); }
- Implement
int age()
in people classclass Person implements AgeComparable {int age; int weight; int height; int age() { return this.age; }; }
(and the same applies for animals) - Extend the comparing function by interface:
bool compare (AgeComparable either, AgeComparable other){ return either.age() < other.age(); }
, so that comparing function only specifies what feature it needs, without having to know how to get the feature.
Mocking Using Dependency Injection
Dependency Injection comes in very handy in situations where testing an operation can affect the whole system in an undesirable manner.
Mocking allows you to isolate the operation from the real environment and perform tests in sandboxes, for example:
- You want to test an operation
int post_content(string content, Server server) { return server.post("localhost:8080", content); }
which internally makes a real HTTP request usingclass Server { int post(string content) { // real http request }; }
. - For this, you extract an
interface ServerInterface { int post(string content);
and adaptclass Server implements ServerInterface { ... }
, then create aclass MockServer implements ServerInterface { int post(string content) { return 0; }; }
with a “harmless” post operation and finally update the function parameterint post_content (string content, ServerInterface server) { ... }
.
How Dependency Injection Relates to Tuples vs. Dicts
In the most basic words, dicts are dependency-injected tuples, which serve as a common interface of features across different tuples and used mainly in case the features of tuples are not necessarily known.
Formally, it can be seen as the key-value pairs (k, v) inside a set where each key is mapped to a value uniquely.
For the simplicity, I borrow the same notations from my previous article “Sets & Tuples”, which deals with the two most basic data structures used in maths. I recommend you to take a look if you haven’t seen.
Step-by-Step Construction of Dicts in Maths
Multiset
The key difference from sets is that each element can occur multiple times in multisets. It is notated as v_1, v_2, … instead of {v_1, v_2, …}
Ordered Set
Let’s define a tuple (i, v_i) for each i ∈ ℕ whereas v_i ∈ {v_1, v_2, …} inside a set called V_ℕ = {(i, v_i) | i ∈ ℕ, v_i ∈ {v_1, v_2, …}}.
We could name V_ℕ as an ordered set, where each i acts as a unique identifier for each v_i.
However, it cannot be a building block of dict, because the values in V_ℕ are restricted unnecessarily so that no value v_i ∈ {v_1, v_2, …} can occur multiple times.
Abstract Tuple
Switching value domains to multiset v_i ∈ v_1, v_2, … instead, now we can guarantee values to occur multiple times instead.
The modified V_ℕ qualifies as an abstract tuple or a building block of a dict, which allows us to look up items in a set by their index.
The indexing is done by filtering: V_ℕ(j) = {(i, v_i) ∈ V_ℕ | i = j}. Unlike tuple indexing, it can be performed regardless of whether the item exists, rather than “throwing a no-such-item error”.
It returns a set {(j, v_j)} with j-th key-value pair if it exists, otherwise returns an empty set {}
Dict
The difference between a dict and an abstract tuple is that the elements are indexed not by counter i, but unique key identifiers k_i.
So, additionally, a dict requires a hash function h: K → ℕ which maps the key identifier to an index uniquely, also this should always evaluate to true: ∀(k_i, k_j) ∈ K × K: h(k_i) = h(k_j) → k_i = k_j.
Assuming such a function exists, now we’re ready to create a dict called V_K = {(h(k_i), v_i) | k_i ∈ K, v_i ∈ v_1, v_2, …}, which is the same as V_ℕ but with h as a perfect built-in hash function.
Lookup by key k is performed similarly: H_i(k) = {(h(k_i), v_i) ∈ H_i | h(k_i) = h(k)}
Dependency Injection on Person-Animal Comparison
Let’s go back to the first example where we compare person’s age and a species’ age. Now we can cast:
- person tuple P_i = (a_i, w_i, h_i) to a dict H_i = {(h(“age”), a_i), (h(“weight”), w_i), (h(“height”), h_i)}, and
- animal tuple A_j = (z_j, h_j, w_j, a_j) to another dict H_j = {(h(“genus”), z_j), (h(“height”), h_j), (h(“weight”), w_j), (h(“age”), a_j))}.
Now, looking up the age of human has the same underlying complexity as animal. Below are some interesting analogies:
interface AgeComparable { int age(); }
is a dict H which contains an age feature.class Person implements AgeComparable { int age; ...; int age() { return this.age; }; }
is a tuple P with a defined age endpoint for H.bool compare (AgeComparable either, AgeComparable other) { return either.age() < other.age(); }
:either.age()
corresponds to the value in H_i(“age”).- In
compare(...)
: H_i(“age”) cannot be empty, because the typeAgeComparable
requires each class to implement anage(...)
function.
How Merging Two Tuples Would Work Using Dicts
Likewise, you can use the same operation while merging two different tuples, e.g. X = (a_i, b_i, c_i) and Y = (a_j, d_j) to H = {“a”: π(a_i, a_j), “b”: b_i, “c”: c_i, “d”: d_j} and then to Z = (a, b, c, d).
Hereby we define a function π: A × A → A which selects any of both arguments based on a selecting criteria.
It corresponds to the custom implementation of a constructor in a class:
class Z implements XInterface, YInterface { int a, b, c, d; Z(X x, Y y) { this.a = x.a; // same as π(x.a, y.a) = x.a ... }; int a() { return this.a; }; ... }
,
given class X implements XInterface {...}
and the same for class Y.
Internally, that happens:
- Cast X to H_x = {(h(“a”), a_i), (h(“b”), b_i), (h(“c”), c_i)} and Y likewise to H_y = {(h(“a”), a_j), (h(“d”), d_j)}
- Define the set of keys for H_z as the unified set of the keys of H_x and H_y: K(H_z) = K(H_x) ∪ K(H_y) = {“a”, “b”, “c”} ∪ {“a”, “d”} = {“a”, “b”, “c”, “d”}
- For each key k in K(H_z), perform lookup for both H_x and H_y.
- If the result of H_x(k) and H_y(k) is in conflict, select one of them using v = π(…, …), otherwise exactly one of them should return a non-empty result v = …
- After initializing v, update H_z = H_z ∪ {(h(k), v)}
- Extract a new tuple Z from H_z simply by getting the values: Z = (π(a_i, a_j), b_i, c_i, d_j)
How Tuple vs. Dict Relationship Is Related to Joining Datasets
I discussed previously in one of my articles, how “Logic Is Built On Relations”.
It revolves around relations ( = subsets of sets of tuples), simplistically the same as binary information holders.
Signature corresponds to the list of columns of a dataset, and a structure to a dataset itself.
The idea behind the signature — structure relationship is roughly the same as tuple — dict relationship:
- Why a structure ( = a class) is defined as a tuple,
- How a signature ( = an interface) serves as a common interface for structures with the same special methods like a set of keys in a dict, and
- How the merging of two structures results as a new structure, similarly to how the merging of X and Y results as a new tuple Z.
Conclusion
Tuples are the main data structures for storing information in mathematical universe.
Although tuples are strict, efficient and completely safe to use, these features are also associated with restrictions and inflexibility.
Dicts are the intermediary data structures used for performing “whichever task a tuple cannot perform”.
Algebraically, it’s not different than a set of key-value pairs with a built-in hash function, yet it is much more powerful for tasks such as performing lookups, merging tuples, etc.
Unsurprisingly, this relationship is already known, see common analogies such as class — interface and signature — structure.
Using mainly tuple — dict analogy, it’s somehow interesting to resurface some best practices for Dependency Injection.
References
Fig 5: https://www.tutorialsteacher.com/ioc/dependency-injection
Fig 6: https://jenkov.com/tutorials/java-unit-testing/testing-with-dependency-injection-containers.html