Handling Primitive Types

Source: Data types, pt. 1 by M. Lim, Medium

Mutability in ReasonML is a feature primarily used with records. Here’s how you define a mutable type that can hold any value:

In λ-calculus, function types like f: A -> B imply a transformation or proof from A to B. The functions toMut and fromMut serve as bridges between mutable and immutable types, showing they can be interchangeable:

Using the standard implementation as a reference, you can see how it translates in practical terms:

ReasonML requires explicit type declarations for union types, unlike TypeScript where you might use “int | null”:

Variant types in ReasonML, akin to sum types or dependent types in other languages, are exhaustive and unique, ensuring no overlap between types.

Handling these types necessitates a switch statement, covering all possible variants for completeness:

For pattern matching, especially when not all cases are addressed explicitly, the wildcard “_” can be a fallback. The “when” keyword allows for additional conditions in matches, offering a concise way to handle complex patterns:

Testing different cases to demonstrate how these functions work with the defined types:

Database Storage

Source: Python Pandas DataFrame, geeksforgeeks.org

Working with generic types and function handling is one thing, but let’s step into a scenario where the stakes are higher: being data scientists at a medical center, tasked with deriving basic statistics from patient data. Given a raw database in a text file format, we’ve managed to extract key information as follows:

The data’s format is row-wise with an arbitrary order, which could include duplicates, missing, or extra items. To streamline debugging, we’re aiming to cast these to string maps.

Here’s where the Map.Make functor comes into play, allowing us to generate a StringMap keyed by strings for easier readability and manipulation, akin to handling JSON objects:

Converting our map to a JSON-like string for debug-friendliness:

Transitioning to a generic database format, our row-wise data gets a transformation. Here, we’re all about clarity and correctness:

Accessing the elements becomes straightforward. Missing elements? No biggie, we handle that gracefully:

After some exploration and analysis, key insights emerge regarding age, weight, name, and gender. These form the cornerstone of our new database signature:

Structuring a strictly-typed database while ensuring all necessary keys are included and defaults are set for missing items:

Finally, the database is correctly formatted. Next up, transposing this database to a column-wise format, ready for statistical analysis:

Successfully transposing our data, we’re set to dive deep into the analysis, equipped with a database that’s both comprehensive and accurately structured.

Statistics Library

Source: What is the difference between pandas “describe” and “describe()”?, Stack Overflow

Venturing into building a generic statistics library, we’re focusing on versatility across different data structures. Whether dealing with Lists or Arrays, the goal is seamless integration and handling.

The challenge lies in unpacking these varied structures into a consistent format, ideally a list of primitives, for statistical analysis. Given the evolving nature of our library, not all types are fully supported yet, leading to potential exceptions for unhandled cases.

To ensure a smooth user experience, we wrap secToList in a function that gracefully handles exceptions, providing clear feedback rather than abrupt errors.

Diving into the core of our library, we aim to offer basic but essential statistical functions, each designed to work with a specific type of data.

getCounts tackles enumeration, mapping each unique string in a list to its frequency of occurrence, ideal for categorical data analysis.

getMedian finds the middle value in a list of floats after sorting, a key measure of central tendency for quantitative data.

getMean computes the average from a list of floats, providing a quick snapshot of your data’s central value.

Structuring our output, we define types for summarizing string and float data, encapsulating counts for categorical data and central measures for continuous data.

Data Analysis

Source: What is Exploratory Data Analysis?, towardsdatascience.com

Integrating complex data structures into actionable insights requires a systematic approach. We’ve established methods to deconstruct complex data into primitive lists and have statistical functions ready. The challenge now is to seamlessly connect these components.

First, let’s categorize data types based on their analytical utility:

  • Nominal data, like colors “Blue” and “Yellow”, are distinct without a meaningful order. The mode serves as their central tendency measure.
  • Boolean data, a subset of nominal, consists of binary choices like “Yes” and “No”.
  • Ordinal data, such as ratings from 1 (“Low”) to 5 (“High”), imply order but not the uniformity of scale. Their median reflects central tendency.
  • Numeric data, representing continuous or interval data, can be fully quantified. The mean is used to express their central tendency.
Source: Nominal, Ordinal, Interval and Ratio Data, Microbe Notes

Recognizing that not all operations apply universally across data types underscores the importance of targeted analytical strategies.

Transitioning to the implementation phase:

The `listToFloat` function converts a list of primitives to floats, selectively allowing boolean and integer conversions based on flags:

Converting primitives to strings while handling incompatible types by casting to floats, then stringifying, allows for flexible data interpretation:

`getFloatStats` and `getStringStats` functions then process these lists into statistical summaries, considering the data level restrictions and debug options respectively:

Finally, transforming primitive lists to the expected input format for statistical analysis:

Extracting statistics becomes straightforward with prepped data:

Now, it’s just a matter of calling the appropriate statistical function for each data type, with optional debugging for string conversions:

Wrapping up our deep dive into handling a database use case with ReasonML, it’s clear we’ve covered significant ground. From extracting raw data to applying a generic statistics library, this journey showcased ReasonML’s capability to tackle complex data challenges head-on.

I hope you enjoyed reading my article series. Stay tuned!

--

--

Yamac Eren Ay
Yamac Eren Ay

No responses yet