12 Mar 2025 • 2775 words • #Kotlin • #Multiplatform • #Parsing

Tried Reading the HTML Spec, Wrote a Parser in Kotlin

Recently, I got curious about HTML parsing—you know, that thing browsers do billions of times a day that we all take completely for granted.

How hard could it be? I thought, like every programmer before me who has wandered into this particular circle of hell.

Turns out: very hard. HTML parsing isn’t just about recognizing <div> tags and calling it a day. It’s a complex mess of state machines, error recovery, and edge cases with countless bizarre scenarios.

The good news? HTML parsing is a solved problem. It’s been thoroughly documented in standards like the WHATWG and W3C spec. I went with the WHATWG HTML Living Standard¹ because it’s what all modern browsers actually implement, and it’s actively maintained. The WHATWG spec defines a parsing algorithm so intricate that implementing it correctly is genuinely challenging.

So naturally, I did it in Kotlin. Because why make life easy?

Note: This article is being written while the implementation is still ongoing, so some details and approaches may evolve as I continue learning and refining the parser.

Why HTML Parsing is Actually Terrible

Before diving into the implementation details, here’s what I discovered after spending way too many weekends reading the WHATWG spec:

The Fundamental Problem

HTML was designed to be written by humans, which means it was designed to be forgiving. Missing a closing tag? No problem! Forgot to escape that & character? We’ll figure it out! Nested a <p> inside another <p>? Sure, why not—we’ll just magically close the first one!

This sounds great in theory, but in practice, it means the parser has to be psychic. It needs to handle:

The 80+ tokenization states: Each with its own special rules for what characters mean what
The 20+ tree construction modes: Because apparently, parsing <script> content is different from parsing <style> content is different from parsing regular content
Character entity madness: There are over 2,000 named character entities, plus numeric ones, plus special rules for when they’re malformed
And much more: I haven’t even reached all the edge cases yet

Two-Phase Parsing

The WHATWG spec splits parsing into two phases, presumably to prevent implementers from going completely insane:

Tokenization: Turn the character soup into discrete tokens
Tree Construction: Turn those tokens into a document tree

This separation makes sense—tokenization handles all the character-level complexity, while tree construction handles all the structural complexity.

Why I Chose Kotlin

Now, you might be wondering: why Kotlin? Isn’t this the kind of thing you’d normally write in C++ or Rust, languages that are designed to make you suffer in more interesting ways?

Here’s the thing: I only know Kotlin really well, and frankly, I didn’t want to suffer through learning another language’s syntax on top of implementing the complexity of an HTML parser. Fortunately, Kotlin turned out to be perfect for the job—it’s expressive enough that I can write state machines using DSLs that read almost like the English spec itself. It’s like having a magic translator that converts WHATWG bureaucratic prose directly into working code.

Let me show you what I mean:

Implementation Approach

The WHATWG spec defines parsing as a state machine with precise rules for each state transition. Here’s where Kotlin’s expressiveness really shines—I can model these states in a way that directly mirrors the specification.

Modeling Tokens

First, I need to represent the different token types the spec defines. Kotlin’s sealed interfaces handle this perfectly:

internal sealed interface Token {

    data class StartTag(
        val name: String,
        val selfClosing: Boolean = false,
        val attributes: MutableMap<String, String> = LinkedHashMap()
    ) : Token {

        fun addAttribute(name: String, value: String) {
            if (!attributes.containsKey(name)) {
                attributes[name.lowercase()] = value
            }
        }
    }

    data object DocType : Token
    data class EndTag(val name: String) : Token
    data class Comment(val data: String) : Token
    data class Character(val data: Char) : Token
}

The type system ensures I can’t accidentally mix up token types, and Kotlin’s when expressions force me to handle every possible case.

From Spec to Code

Here’s what makes Kotlin particularly good for this: I can translate the WHATWG prose almost directly into code. Take the “Character reference state”² from section 13.2.5.72 of the spec:

Set the temporary buffer to the empty string. Append a U+0026 AMPERSAND (&) character to the temporary buffer. Consume the next input character:

ASCII alphanumeric

Reconsume in the named character reference state.

U+0023 NUMBER SIGN (#)

Append the current input character to the temporary buffer. Switch to the numeric character reference state.

Anything else

Flush code points consumed as a character reference. Reconsume in the return state.

This translates almost word-for-word into Kotlin:

internal class CharacterReference(
    private val returnState: TokenizationState,
) : TokenizationState {

    override fun consume(codePoint: Int, tokenizer: Tokenizer): StateTransition {
        tokenizer.tempBuffer.clear()
        tokenizer.tempBuffer.append(Chars.Ampersand.char.code.toChar())

        return when {
            codePoint.isAsciiAlphaNumeric() -> reconsumeCurrentChar {
                NamedCharacterReference(returnState)
            }

            codePoint.IS(Chars.NumberSign) -> {
                tokenizer.tempBuffer.append(codePoint.toChar())
                consumeNextChar { NumericCharacterReference(returnState) }
            }

            else -> {
                returnState.flushCodePoints(tokenizer)
                reconsumeCurrentChar { returnState }
            }
        }
    }
}

Current Implementation Progress

I’ve made significant progress on the tokenizer, implementing about 24 tokenization states so far. The parser can handle:

Core Tokenization:

Basic tag parsing (open/close tags, tag names, attributes)
Character references (both named entities like & and numeric ones like A)
Self-closing tags
Attribute parsing (quoted, unquoted, with proper error handling)

Tree Construction:

All major insertion modes (InBody, InHead, InTable, etc.)
Element stack management
Basic DOM tree building

The implementation follows the WHATWG spec closely, but I’m being selective about which edge cases to implement. I’m not planning to handle every possible malformed HTML scenario—frankly, I’m not sure exactly how I’ll use this parser yet. Maybe as an agnostic parsing library for custom rendering, maybe for data extraction, definitely not for building a browser (I’m not that ambitious).

What’s Missing:

Script and style content parsing (complex and I may not need them)
Some of the more obscure tokenization states
Full error recovery for severely malformed HTML

The core functionality works well enough to parse most real-world HTML documents. Whether I implement the remaining edge cases depends on what use cases emerge.

Character Entities: Where Sanity Goes to Die

Remember when I mentioned that there are over 2,000 named character entities? Well, buckle up, because this is where HTML parsing goes from “complicated” to “actively malicious.”

Think & and < are simple? Think again. The spec has special rules for:

Ambiguous ampersands: What happens when you write &notit; and there’s both ¬ and &notit; in the entity table?
Missing semicolons: &amp without a semicolon is sometimes valid, but only in certain contexts
Invalid numeric ranges: &#x110000; is outside the valid Unicode range, so it becomes the replacement character
Legacy compatibility: Some malformed entities are “fixed” for backwards compatibility with ancient HTML

Named Character References

Let’s say you’re parsing this HTML: <div title="He said &not today">. When you hit that ampersand, you need to figure out if it’s an entity. But here’s the problem: there are multiple entities that start with “not”:

¬ → ¬ (logical NOT symbol)
∉ → ∉ (not an element of)
&notni; → ∌ (does not contain as member)

The WHATWG spec says: be greedy. Consume as many characters as possible to form the longest valid entity. This is called “maximal munch”³.

The Trie Structure

Instead of scanning through 2,000+ entities linearly, I built a trie that looks like this:

Each node in the trie can serve multiple roles:

Intermediate path nodes: Like “A” → “E” → “L” → “i”, which form part of longer entity names
Terminal nodes with codepoints: Nodes that mark complete entities (like “AElig” → Æ), indicated by green nodes in the diagram
Both simultaneously: A node can be both a valid endpoint AND continue to longer entities (like “AMP” which is complete but can extend to “AMP;”)

This dual nature is crucial for the WHATWG spec’s “maximal munch” behavior. For example, when parsing &AMP;, the algorithm will:

Find a valid match at “AMP” (stores this as a potential result)
Continue and find an even better match at “AMP;” (updates the result)
Return the longest valid match found

Both &AMP and &AMP; exist as separate entries in the NamedChars map, so both paths have their own codepoints array set in the trie nodes.

Following the Trail

Here’s how the algorithm works when parsing &not today:

var currentNode = EntityTrie.root
var lastMatch: Triple<String, IntArray, Int>? = null

while (true) {
    val char = inputBuffer[position] ?: break
    val nextNode = currentNode.children[char] ?: break
    
    currentNode = nextNode
    consumedChars += char
    
    // If this node has codepoints, it's a complete entity
    currentNode.codepoints?.let {
        lastMatch = Triple(consumedChars, it, position)
    }
}

Let’s trace through the steps:

When Paths Get Longer

Here’s a more complex example showing how the algorithm handles invalid entities. If we parsed &notit; (note: this isn’t a real HTML entity, but &not is):

The key insight: we greedily consume characters as long as valid paths exist, but always remember the longest complete entity we’ve seen. In this case, even though &notit doesn’t exist as an entity, the algorithm successfully falls back to &not → ¬ (the longest valid match it found) and leaves it; for further parsing, Resulting in the string: ¬it;.

This demonstrates how your implementation handles malformed or non-existent entities gracefully - it doesn’t fail completely but instead uses the best partial match available.

Context-Sensitive Edge Cases

The spec has a weird rule for attribute values. Consider:

<a href="?foo=1&amp">     <!-- Treat &amp as literal text -->
<a href="?foo=1&amp;">    <!-- Decode &amp; to & -->

My implementation handles this context sensitivity:

if (returnState.isAttributeState() && !match.endsWith(';') && 
    nextChar?.isLetterOrDigit() == true) {
    // Legacy behavior: don't decode
    tokenizer.currentAttributeValue.append(match)
}

This whole system works like following a GPS route—you keep driving as long as there’s a valid path ahead, but you remember the last landmark you passed in case you need to turn around.

Tree Construction: The Fun Isn’t Over Yet

Okay, so you’ve survived the tokenization phase. Your HTML has been chopped up into a nice stream of tokens. Surely the hard part is over, right?

Right?

Nope! Now we get to build a tree from those tokens, and this is where HTML’s “forgiving” nature really shines.

I’ve just started implementing the tree construction phase, and I’m already facing some decisions. The WHATWG spec defines not just how to build a tree, but also a full DOM API with all the bells and whistles. The question is: should I implement the entire DOM specification from scratch and go completely crazy, or keep it simple for now with a minimal tree structure?

For now, I’m taking the simple approach.

The Current Architecture

My tree builder follows the insertion mode pattern from the spec:

interface InsertionMode {
    fun process(token: Token, treeBuilder: TreeBuilder)
}

class TreeBuilder {
    private var insertionMode: InsertionMode = Initial
    private val openElements = mutableListOf<TreeNode>()
    val document: TreeNode = TreeNode.createDocument()
}

The tree construction algorithm has to handle some interesting scenarios:

Implicit tag closing: <p>Hello<div>World</div> automatically closes the <p> before opening the <div>
The stack of open elements: Track exactly which elements are open and in what order
Multiple insertion modes: Initial, BeforeHTML, InHead, InBody, etc.—each with different rules

The TreeNode: Intentionally Simple

Instead of implementing a full DOM, I went with a minimal tree structure:

data class TreeNode(
    val type: NodeType,
    val name: String? = null,                     // Tag name
    var data: String? = null,                     // Text content
    val attributes: MutableMap<String, String>,   // Attributes
    val children: MutableList<TreeNode>,          // Child nodes
    var parent: TreeNode? = null                  // Parent reference
)

No DOM methods like querySelector or addEventListener—just the pure tree structure. This keeps things focused on the parsing problem rather than getting lost in DOM API complexity.

Smart Details: Text Node Merging

One thing I did implement is automatic text node merging:

fun insertText(data: String) {
    val lastChild = currentNode.lastChild
    if (lastChild?.type == NodeType.TEXT) {
        // Merge with previous text node
        lastChild.data = (lastChild.data ?: "") + data
    } else {
        val textNode = TreeNode.createText(data)
        currentNode.appendChild(textNode)
    }
}

So parsing <p>Hello World</p> creates one text node, not two separate ones.

Current Status and the Big Question

I have the basic insertion modes stubbed out and the core tree building logic working. But I keep coming back to the same question: how far should I take this?

The WHATWG spec has hundreds of pages on DOM APIs. Do I implement Element.querySelector()? What about MutationObserver? Event handling? CSS selector matching?

For now, I’m resisting the urge to build a full browser engine. The tree structure I have is sufficient for parsing HTML and representing the document structure. If I need more functionality later, I can always build a DOM API layer on top of this foundation.

The tree construction phase is turning out to be much more straightforward than tokenization—at least so far.

Testing: How to Know You’re Not Completely Wrong

Here’s the thing about HTML parsing: you can’t just write some tests and call it a day. The edge cases are so numerous and bizarre that you need industrial-strength test coverage. There are two main test suites for this:

Web Platform Tests (WPT)⁴ - The official W3C test suite used by all major browsers. Comprehensive but complex to integrate.

html5lib⁵ - A more focused test suite specifically for HTML parsing with a simpler JSON format.

I went with html5lib for tokenizer tests since it’s much easier to understand and implement. The test format is straightforward:

{
    "input": "&AElig",
    "description": "Named entity: AElig without a semi-colon",
    "output": [
        [
            "Character",
            "\u00c6"
        ]
    ],
    "errors": [
        { "code": "missing-semicolon-after-character-reference", "line": 1,  "col": 7 }
    ]
}

Building a Test Generator

Rather than manually writing 5,600+ test cases, I built a test generator that fetches the JSON from html5lib and converts it to actual Kotlin test cases:

private fun generateTest(testCase: TestCase, testName: String): String {
    return buildString {
        appendLine("    @Test")
        appendLine("    fun ${testName}() = assertTokenization(")
        appendLine("        input = \"${testCase.input.escapeString()}\",")
        appendLine("        expectedTokens = listOf(")
        appendLine(generateExpectedTokens(testCase.output))
        appendLine("        )")
        appendLine("    )")
    }
}

The generator preserves test case names, inputs, and expected outputs while converting them to idiomatic Kotlin tests. For example, a JSON test like:

{
  "description": "Quoted attribute followed by permitted /",
  "input": "<br a='b'/>",
  "output": [
    [
      "StartTag",
      "br",
      {
        "a": "b"
      },
      true
    ]
  ]
}

Becomes:

@Test
fun `Quoted attribute followed by permitted slash`() = assertTokenization(
    input = "<br a='b'/>",
    expectedTokens = listOf(
        startTag(
            name = "br",
            attributes = mapOf(
                "a" to "b",
            ),
            selfClosing = true
        ),
    )
)

This approach gave me confidence in the tokenizer implementation—having 5,600+ tests from the official html5lib suite means I can catch edge cases I never would have thought of. Plus, if there’s ever an issue with my test expectations, I can easily trace back to the original JSON test case to verify the correct behavior.

Someone, somewhere, tried to use &#x110000; in their HTML (Unicode only goes up to 0x10FFFF). The fact that this edge case has its own test tells you everything you need to know about the state of HTML on the internet.

Reflections

Building this parser taught me that the WHATWG spec isn’t academic theory—it’s web archaeology. Every bizarre rule exists because some website from 1999 would break without it. The 80+ tokenizer states that initially seemed overwhelming are actually elegant: each state knows exactly what it’s responsible for and nothing else. Kotlin’s expressiveness was perfect for this—I could translate spec prose almost word-for-word into working code.

The html5lib test suite was both humbling and educational. Choosing Kotlin Multiplatform turned out well too—HTML parsing is pure algorithms with no platform dependencies, so getting JVM, JavaScript, and native targets for free was a nice bonus.

Should you use this parser? Definitely not yet—it’s still a learning project that I haven’t made public. I’m working on tidying it up and getting it to produce a basic DOM tree before I share the code. Either way, implementing it has been educational—I understand how browsers handle malformed HTML much better now, and I’m content with having built a solid foundation for whatever comes next.

The implementation is still in development and not yet public. It currently passes all of the html5lib tokenization tests and handles basic tree construction, but I want to get it producing proper DOM trees before sharing the code.

WHATWG HTML Living Standard - Parsing - The definitive specification for HTML parsing that all modern browsers implement ↩
Character Reference State - WHATWG HTML specification section defining how to handle character references (entities) during tokenization ↩
Maximal munch is a term from compiler theory meaning “always consume the longest valid match” when multiple parsing options are available ↩
Web Platform Tests - A cross-browser test suite for web platform specifications, ensuring consistent implementation across browsers ↩
html5lib Test Suite - Comprehensive test cases covering every edge case in HTML parsing, used by major browser implementations ↩

28 Sep 2024 • 2190 words • #Kotlin • #Android • #Jetpack Compose

Understanding Jetpack Compose

There isn’t much literature about the internal workings of Compose, the amazing declarative framework which allows you to build not only Android apps but also cross-platform applications. In this article, I would like to explain the core concepts of how Compose works internally. This is going to be a very theoretical exploration, but I’ve created flowcharts to help build a mental model. I’ve skimmed over some parts that I thought weren’t crucial for understanding Compose’s internal mechanics.

Background

Compose was heavily inspired by React and was initially built as an extension of the old Android view system, where you can replicate the XML structure in Kotlin. However, while it initially worked using the old view system, JetBrains didn’t want to integrate that capability into the Kotlin language. This led Google to develop an alternative using function syntax.

https://twitter.com/AndroidDev/status/1363207926580711430

Compose isn’t tied to Android or any specific UI toolkit; it is a general-purpose solution, as long as the problem can be represented in or as a tree. The core components of Compose are:

Compose Compiler
Compose Runtime
Compose UI

When most people (Android developers) refer to Jetpack Compose, they’re usually talking about either Compose UI or all three components together. Compose UI is the UI toolkit specifically designed for Android by Google. However, JetBrains has also developed Compose toolkits for other platforms like Web and iOS. If you’re curious about how the common name ‘Compose’ can lead to ambiguity regarding which specific components are being referred to, Jake’s article¹ provides a great explanation.

Now that I’ve covered the background, let’s dive into the core components of Compose, starting with the Compose Compiler.

Compose Compiler

Before we get into the Compose Compiler itself, let’s first understand what a compiler plugin is. A Compiler plugin interacts with the Intermediate Representation (IR) when the kotlin source code is halfway compiled. At this stage, compiler plugins step in to modify the code as needed.

The Compose Compiler integrates with the Kotlin compiler through the ComponentRegistrar interface, which acts as the entry point for any compiler plugin. It registers a series of extensions that simplify using the Compose library.

The Compose Compiler analyses the source code during compile time to ensure everything is in place for the next steps. This involves checks like verifying Kotlin compiler dependencies since the Composer Compiler depends on specific Kotlin versions.

When the Compose Compiler identifies any function annotated with @Composable, it rewrites it slightly to enable the desired runtime behaviour. This transformation process, where higher-level constructs like lambdas and inline functions are converted into, lower-level representations, is known as “Lowering.” The Compose Compiler modifies elements in the IR tree during this phase to make them compatible with the runtime.

Here’s a simplified flowchart I created to visualise the key steps of the Compose Compiler.

In summary, the Compose Compiler modifies the source code during the IR phase by adding and altering metadata, which ultimately supports the functionality of the Compose Runtime. Afterwards, the IR is compiled into the native binaries for JVM, JS, LLVM (for iOS), or WASM, allowing Compose to be multi-platform. The IR modifications made by the Compose Compiler ease the work of the Compose Runtime. While I won’t go into the details of every lowering, I’ll briefly touch on a few important transformations.

Let’s now explore some of the key lowerings and transformations done by the Compose Compiler.

Infer Class Stability

Compose infers the stability of classes to help the runtime with smart recompositions. It allows the runtime to skip recomposing @Composable functions whose inputs haven’t changed. The Compose Compiler infers stability based on certain criteria², and this algorithm is constantly evolving.

Primitive types are stable by default in Compose. For custom types such as data classes, you need to annotate them with @Stable or @Immutable if they should be treated as stable.

Enabling Live Literals

Live literals enable the Compose tooling to generate live previews of @Composables functions without needing recompilation.

“This transformation is intended to improve developer experience and should never be enabled in a release build as it will significantly slow down performance-conscious code”

Compose Lambda Memoization

In this step, the Compose Compiler teaches the runtime how to handle the lambdas passed to the @Composable functions. There are two kinds of lambdas: composable and non-composable.

If a non-composable lambda passed to a @Composable function doesn’t capture any value at the call site, the Kotlin compiler optimises it by modelling it as a singleton. However, if a non-composable lambda captures a value, the Compose Compiler optimises it by wrapping it in a remember call.

For composable lambdas, the Compose Compiler memoizes them by storing and retrieving them from a slot table, allowing the runtime to efficiently recompose only the parts of the UI tree that need updating. This technique, originally called “donut-hole skipping”³ because it allows for a lambda to be updated “high” in the tree, and for Compose to only need to recompose at the very “low” portion of the tree where this value is actually read.

Inject the composer

The Compose compiler injects a special $composer parameter into all @Composable functions and passes it to their sub-composables. The compiler wraps the body of each @Composable function with $composer.start(key) and $composer.end() calls, where key represents an arbitrary integer that serves as a hash of the function’s name and call site.

The Compose Compiler performs additional optimisations & lowerings on the sources by inserting metadata and modifying the source code. These changes will ultimately help the Compose Runtime in its operations.

Having examined the role of the Compose Compiler, we can now turn our attention to the heart of Compose: the Compose Runtime.

Compose Runtime

The Compose Runtime⁴ is the heart of Compose. It manages state, handles smart recomposition, and ensures efficient updates to the UI. The $composer injected by the compiler connects composable functions to the Compose Runtime.

To help you understand how the runtime works, here’s a simplified flowchart:

The Compose Runtime’s core data structures are the Slot Table and the list of changes. A Slot Table⁵ is based on the concept of a data structure called Gap Buffer (commonly used in text editors). As composables emit, the Slot Table is populated with their state.

When a composable is executed or emits, the runtime stores the current state of the composition in the Slot Table, which records details like the location of each composable, its parameters, remembered values, and CompositionLocals. Think of the slot table as a method trace of the composition process.

The slot table holds the record of what occurred during composition, which functions were called, and what parameters were used. The runtime uses this information to generate a list of changes based on the current information available in the slot table. This list is what makes the actual changes to the tree.

This list of changes is passed to an interface called Applier, which applies these changes to the node tree. The Applier is platform agnostic, making it possible for any UI toolkit to work with the Compose Runtime.

Once the Applier builds the tree, the Compose UI toolkit materialises the nodes into actual displayed on the screen. The runtime is also responsible for tracking recomposition invalidations and ensuring updates are applied efficiently.

Now that we’ve covered the Compose Runtime, let’s look at how Compose UI builds on top of this foundation.

Compose UI

Compose UI is a client library for the Compose Runtime. In this article, I’m focusing on Compose UI for Android by Google, but there are also client libraries for other platforms like Web and iOS, created by JetBrains.

Compose UI is like the bridge between your code and what you see on the screen. For Android, we use a special type of node called a LayoutNode to handle all the layout, measuring, and drawing of your components.

Composing the Nodes

So, what happens when you call setContent in your Compose code? Well, that’s where the magic begins! The Compose Runtime kicks off by building a tree of nodes—basically little chunks of UI that describe how your app should look and behave. Each part of your UI (like a button or a text field) gets its own node, and all of these nodes together form a tree.

In the Android world, each node is represented by a LayoutNode. Think of a LayoutNode as a blueprint for a particular UI component. It knows how big it needs to be, where it should be positioned, and how it should respond when things change (like when the user taps a button or scrolls a list).

Composing is the process of creating this tree of nodes. Compose will either create new nodes or reuse existing ones if nothing has changed. These nodes are stored in a slot table and basically serve as the backbone for your app’s UI.

Measuring the Nodes

Once our nodes are all set up, the next step is figuring out how much space they need. This is the measuring phase, where each node takes a look at its content and constraints (like available width and height) and decides how big it should be.

Each LayoutNode comes with a MeasurePolicy—kind of like a rulebook that tells it how to measure itself. For example, a Text composable will look at things like the font size, the text itself, and any padding or margins to figure out how much space it needs.

The measuring happens from the top (root node) down to the smallest child nodes. A parent will ask its children how big they want to be, and once everyone reports back, the parent calculates its own size based on the children.

Placing the Nodes

Now that we know how big each node should be, it’s time to figure out where to put them. This is the placement phase, where each node is assigned a spot on the screen.

A LayoutNode has something called a Placeable which handles its position. Think of it like telling each UI element, “You, go here, and you, go there.” The Placeable figures out the exact x and y coordinates based on things like alignment, padding, and other layout rules.

Placement happens in the same top-down fashion. Each parent node places its children based on their sizes and the space available to the parent.

For example, if you have a Column composable, it’ll stack its children vertically, one below the other, placing each one based on the height they reported during the measuring phase.

Drawing the Nodes

Finally, we get to the fun part—actually drawing the UI on the screen! Once the nodes are measured and placed, Compose knows where and how big everything should be, so it can start drawing.

Each LayoutNode creates a DrawNode (yes, more nodes!) that knows how to render the actual content. For instance, if you have a Text composable, its DrawNode will take care of painting the text, applying fonts, colours, and all that good stuff.

On Android, we use a Canvas to handle the drawing. The DrawNode will generate drawing commands, and the Canvas will render them on the screen.

This happens from the leaf nodes (the smallest components) back up to the root, ensuring that everything is drawn in the right order.

So, to break it down:

Composition: The tree of nodes is built by running your Composable functions.
Measuring: Each node figures out how much space it needs.
Placement: The nodes are positioned on the screen.
Drawing: The nodes are drawn using Android’s Canvas.

So there you have it—an inside look at how Jetpack Compose works under the hood! By breaking down the Compose Compiler, Compose Runtime, and Compose UI, we can see how each part plays a crucial role in making Compose so powerful and flexible. While it might seem like a lot to digest at first, the core idea is pretty simple: Compose builds a tree, figures out what changed, and updates just those parts. It’s all about efficiency.

Understanding these internals can really help you take full advantage of Compose, especially when optimizing performance or debugging complex UI issues.

The beauty of Compose is that it scales—whether you’re building small apps or complex, multi-platform solutions, it handles everything from state management to UI rendering with ease. And honestly, once you get comfortable with the framework, there’s no going back!

If you’re feeling adventurous, dive deeper into the topics we skimmed over—there’s so much more to explore, from stability inference to the Slot Table’s inner workings. But for now, I hope this article gave you a solid mental model of how Compose works.

Keep experimenting, keep building, and most importantly, have fun with it!

Jake Wharton’s article on compose naming ↩
Jorge Castillo the author of the book Jetpack Compose Internals recommends reading library tests for the ClassStabilityTransformTests to understand how compose infers stability. ↩
Learn more about donut-hole skipping in this article by Vinay Gaba ↩
Documentation for Compose Runtime ↩
Leland Richardson, a software developer who contributed to building the Compose Runtime, wrote an article explaining how the slot table functions in the Compose Runtime. Most of the literature or articles available online that discuss the internal workings of Compose are based on his article. ↩

05 Aug 2023 • 2339 words • #Kotlin • #Ktor • #Docker

Building a Server using Ktor

If you seek to unlock the potential of a powerful trifecta — Ktor, PostgreSQL, and Docker — in your pursuit of seamless deployment, you’ve come to the right place.

Today we embark on a journey revealing the art of deploying a Ktor-PostgreSQL server using Docker on Hostinger or any other server of your choosing

Part I: Laying the Foundations — PostgreSQL and Flyway — First, we shall lay the cornerstone of our server’s infrastructure. Using Docker, we set up a PostgreSQL database. To ensure seamless migrations, we shall use the help of Flyway.

Part II: Launching the Ktor Server — Docker at its Finest — Next we will look at deploying our code on a server using Docker and Docker Hub.

Part III: Reaching Zenith — Seamlessly Updating and Migrating — Finally we will learn the art of server updates and database migrations.

Part I: Laying the Foundations — PostgreSQL and Flyway

In this article we won’t delve into the intricate details of Docker, Flyway, PostgreSQL or Ktor — there’s already a wealth of knowledge on each. Instead, we focus on the art of combining these powerful technologies together.

If you haven’t already, install docker following these instructions: https://docs.docker.com/engine/install/

We begin by containerising our ktor project and creating a Dockerfile in the root of our ktor project directory. This docker configuration is for our local machine, we’ll look at docker configuration for the server when we setup our server.

This will build our project and generate a .jar file through the buildFatJar Gradle task.

FROM gradle:7-jdk11 AS build
COPY --chown=gradle:gradle . /home/gradle/src
WORKDIR /home/gradle/src
RUN gradle buildFatJar --no-daemon

Next, We need to create a docker-compose.yml file in the root of our project with two services one for the database and the other for ktor server.

For the database service, we’ll be utilizing PostgreSQL, and thus, the appropriate image to be used is postgres. It is worth noting that if you are working on an Apple M1 machine, you must specify the platform as linux/amd64 to ensure compatibility. However, if your server does not utilize the M1 chip, this specification won’t be necessary. Next, we also need to provide a name for our database container, along with defining volumes, ports, and a health check. We pass sensitive information, such as the database name, username, and password, securely using a .env file.

services:
  db:
    image: postgres
    platform: linux/amd64
    restart: always
    container_name: backend-db
    volumes:
      - pg-volume:/var/lib/postgresql/data
    env_file:
      - postgres.env
    ports:
      - "5432:5432"
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB" ]
      interval: 5s
  ktor:
    build: .
    platform: linux/amd64
    container_name: backend-ktor
    restart: always
    ports:
      - "8080:8080"
    env_file:
      - postgres.env
    depends_on:
      db:
        condition: service_healthy
  
volumes:
    pg-volume: {}

For the ktor service, we will build the ktor project and specify the configuration, such as linux/amd64 architecture, container name, ports. Finally, we pass the same .env file to our ktor project as used in database service for it to connect to the database and perform read/write operations.

To pass the database name, user, and password, we create a .env file in the root directory of the project. Which looks like this:

POSTGRES_DB={DATABASE_NAME}
POSTGRES_USER={DATABASE_USER_NAME}
POSTGRES_PASSWORD={DATABASE_PASSWORD}

Obviously, you have the flexibility to add more values to the .env file, including ports, drivers, and other configurations, as needed. We then read the values in our ktor project using the application.conf file as such:

ktor {
    deployment {
        port = 8080
        port = ${?PORT}
    }
    application {
        modules = [ com.way.ApplicationKt.module ]
    }
}

storage {
    driverClassName = "org.postgresql.Driver"
    jdbcURL = "jdbc:postgresql://db:5432/"${POSTGRES_DB}
    user = ${POSTGRES_USER}
    password = ${POSTGRES_PASSWORD}
}

That concludes the setup for Docker on our local machine. Next, we look at database migrations.

In line with Duncan McGregor’s wisdom — “Nobody ever got fired for using Flyway,” we opt for Flyway as our trusted tool for handling database migrations.

Let’s start by adding flyway to our ktor project. You can also follow this quick start documentation from Flyway: https://documentation.red-gate.com/fd/quickstart-api-184127575.html

implementation("org.flywaydb:flyway-core:9.20.1")

To create our database tables, we use Flyway instead of Exposed or H2. We do this by creating a migration file named V1__Create_database_table.sql This file is placed under the path src/main/resources/db/migration which enables Flyway to recognize and execute the necessary database changes.

create table if not exists public."user"
(
    email          varchar(128),
    first_name     varchar(64)  not null,
    last_name      varchar(64)  not null,
    phone_number   bigint       not null
    primary key,
    date_of_birth  date,
    date_of_signup date         not null,
    password       varchar(256) not null,
    salt           varchar(256) not null
);

alter table public."user"
    owner to {DATABASE_USER_NAME};

The next step is establishing a connection between Flyway and our database by retrieving the required values from the application.conf file and invoking the migrate() function, which will identify the latest migration file and implementing the necessary alterations in our database. Typically, the initial version of the migration file is dedicated to creating the tables in the database. We can then connect to our database using either exposed or any other ORM and perform db operations.

object DatabaseFactory {
    fun init(config: ApplicationConfig) {
        val driverClassName = config.property("storage.driverClassName").getString()
        val jdbcURL = config.property("storage.jdbcURL").getString()
        val dbUser = config.property("storage.user").getString()
        val dbPassword = config.property("storage.password").getString()

        val flyway = Flyway.configure().dataSource(jdbcURL, dbUser, dbPassword).load()
        flyway.migrate()

        Database.connect(url = jdbcURL, user = dbUser, password = dbPassword, driver = driverClassName)
    }

    suspend fun <T> dbQuery(block: suspend () -> T): T =
        newSuspendedTransaction(Dispatchers.IO) { block() }
}

At this point, we are all set to run the docker compose up command in our project directory. This will trigger the download and setup of the PostgreSQL image, build our Ktor project, and seamlessly create the necessary database tables through Flyway. By executing this single command, we can effortlessly orchestrate the entire process, ensuring that our application environment is up and running smoothly with the database fully configured and ready for use in our local machine.

Part II: Launching the Ktor Server — Docker at its Finest

In this article, we explore the step-by-step process of configuring our server, as well as pushing our Ktor project to DockerHub. We look at setting the necessary Docker configurations to push the image to the DockerHub. Subsequently, we detail the process of pulling the latest image from DockerHub onto our server and initializing the Ktor server. By following this comprehensive guide, you’ll be equipped to deploy your Ktor project with ease and efficiency.

Server setup

Let’s begin by establishing a connection to our Virtual Private Server (VPS) using SSH. For Mac users, follow these steps: Navigate to your user’s folder and locate the .ssh folder. Once there, open the terminal and enter the following command:

ssh-keygen -m PEM -t rsa

This will prompt you to enter a filename; provide any relevant file name of your choice. When prompted for a passphrase, simply press “Enter” to proceed without setting a password. Finally, you’ll end up with two files: one with a .pub extension and another without it. Next, copy the content of the .pub file and paste it into your VPS SSH configuration:

SSH Keys for VPS

Now, you should be able to connect to the VPS with your terminal using the ssh -i <keyname> <username>@<host> command, where <keyname> represents the filename of the generated file without the .pub extension, commonly known as the private key.

Once connected to the server, the first task is to update all dependencies using the apt update command. Following this, we proceed to install JAVA with the apt-get install default-jdk command. The subsequent step involves installing Docker on our server, facilitating seamless containerization and deployment.

Install docker following these instructions: https://docs.docker.com/engine/install/

Let’s begin by creating an empty private repository in Docker Hub to facilitate pushing our ktor images and pulling the latest image onto our server.

Next, let’s create a dedicated directory for our project using the mkdir command. For instance, mkdir app will create a directory named “app,” which we will navigate into for our setup.

Within this project directory, we need to create three essential files: Dockerfile, docker-compose.yml, and postgres.env. We start using the nano Dockerfile command, which will open a text editor. Paste the following content into the editor, keeping in mind that the SERVER_DIR represents the directory created earlier, which is “app” in this case. Save the file with Ctrl+S and exit the editor with Ctrl+X.

FROM openjdk:11
EXPOSE 8080:8080
RUN mkdir /{SERVER_DIR}
ENTRYPOINT ["java", "-jar", "/{SERVER_DIR}/{SERVER_NAME}.jar"]

Next, let’s create the docker-compose file using the nano docker-compose.yml command and insert the following content. It mirrors the structure of our local docker-compose file, with the only difference being the ktor image version, where we specify the our DockerHub username and the repository name we created earlier.

services:
  db:
    image: postgres
    restart: always
    container_name: backend-db
    volumes:
      - pg-volume:/var/lib/postgresql/data
    env_file:
      - postgres.env
    ports:
      - "5432:5432"
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB" ]
      interval: 5s

  ktor:
    build: .
    image: {DOCKER_HUB_USERNAME}/{DOCKER_HUB_REPO}:{LATEST_VERSION}
    container_name: backend-ktor
    restart: always
    ports:
      - "8080:8080"
    env_file:
      - postgres.env
    depends_on:
      db:
        condition: service_healthy

volumes:
  pg-volume: {}

Now, let’s create the final file, postgres.env, using the nano postgres.env command. In this file, add your database credentials, which might be different from your local database credentials. It’s essential to ensure a more secure password for the database on the server. Input the required details to establish a secure connection. Once done, save the file with Ctrl+S and exit the editor with Ctrl+X.

POSTGRES_DB={DATABASE_NAME}
POSTGRES_USER={DATABASE_USER_NAME}
POSTGRES_PASSWORD={DATABASE_PASSWORD}

Now, to enable the pulling of private repositories from Docker Hub, we must log in to Docker using the docker login command. This step ensures that we have the necessary access to retrieve private images for our server setup.

With our server fully prepared, the next step is to pull our Docker image. Upon doing so, the Ktor server should be up and running. However, before we execute the pull, we need to ensure our image is available in a suitable repository, which, in this case, will be the one we created earlier on Docker Hub.

Configuring the Ktor Project

To configure Docker for pushing our Ktor project to Docker Hub, we need to make adjustments to our build.gradle.kts file with the following configurations under the ktor block:

ktor {
    fatJar {
        archiveFileName.set("{PROJECT_NAME}-$version-all.jar")
    }
    docker {
        jreVersion.set(io.ktor.plugin.features.JreVersion.JRE_17)
        imageTag.set("${project.version}")
        externalRegistry.set(
            io.ktor.plugin.features.DockerImageRegistry.dockerHub(
                appName = provider { "{DOCKER_HUB_REPO}" },
                username = providers.environmentVariable("DOCKER_HUB_USERNAME"),
                password = providers.environmentVariable("DOCKER_HUB_PASSWORD")
            )
        )
    }
}

Note: Ensure that the {DOCKER_HUB_REPO} matches the repository name on DockerHub.

These configurations enable Docker to push our Ktor project’s Docker image to the specified private repository on DockerHub. Also, it allows the version of the image to be tagged appropriately for version control. With our Docker configurations in place, pushing our Ktor project to DockerHub becomes a breeze. The publishImage Gradle task handles the entire process with a single click.

Gradle tasks for publishing docker image

Deploying Ktor Project to the Server

Let’s return to our server via SSH, navigate to the previously created project directory. Run the following command to pull the latest image from your Docker Hub repository into the server:

docker pull {DOCKER_HUB_USERNAME}/{DOCKER_HUB_REPO}

This ensures that the server has the most up-to-date image available for our project. Now, we can initiate our Ktor server using the following command:

docker compose up -d

With the -d flag, Docker Compose runs the process in the background, allowing our server to operate smoothly without terminal interference. This command will create the PostgreSQL database, initialize our Ktor server, and then execute database table creation using Flyway.

Finally! our server is now up and running, fully equipped to handle requests efficiently and effectively.

Part III: Reaching Zenith — Seamlessly Updating and Migrating

In this article, we delve into the process of updating our server with the latest code changes. We explore the crucial aspect of database updates, commonly referred to as database migrations. Finally, we address how to efficiently view logs from our Ktor project in the server environment

Updating the Server with Latest Code Changes

Pushing our latest code changes to the server is similar to deploying for the first time minus the setup. To proceed, let’s push the latest version of our Ktor project to the server by executing the publishImage Gradle task. This task will build our Ktor project and push the updated image to DockerHub.

Gradle tasks for publishing docker image

After successfully building and pushing the image, we can connect to our server using SSH. Once connected, we pull the updated image from DockerHub with the command:

docker pull {DOCKER_HUB_USERNAME}/{DOCKER_HUB_REPO}

Having pulled the latest image, we navigate to the project directory where the docker-compose.yml file resides.

docker compose up -d

By running the above command, we recreate the Ktor server, implementing the latest changes in the background. With this, our Ktor server is now up-to-date with the latest code changes and efficiently running on the server.

Managing Database Migrations

Now, let’s take a brief look at database migrations. Suppose we want to modify the data type of the “email” column from varchar(128) to “text” in the users table created in Part 1. To achieve this, we must create a new file under the src/main/resources/db/migration path and name it V2__Create_database_table.sql The critical element here is the prefix “V2__” which is incremented from our previous file, while the subsequent name can be chosen freely.

In the newly created V2__Create_database_table.sql file, we’ll include the necessary SQL query to change the data type as per our requirement.

ALTER TABLE user MODIFY email TEXT

And that’s all there is to it! Now, to deploy our latest changes, we follow the procedure described at the beginning of this article. Flyway will seamlessly manage the database migration, ensuring a smooth and successful transition

You can also follow this quick start guide for flyway: https://documentation.red-gate.com/fd/quickstart-api-184127575.html

Viewing Logs from our Ktor Project

Now, we have the capability to attach to our Ktor container and tail logs using the container name we defined in the docker-compose.yml file we setup in Part II. To do this, run the following command:

docker logs -f --tail 100 backend-ktor

In this command, 100 represents the number of the last log lines you wish to view. You can adjust this number based on your specific requirements. By executing this command, we gain access to all network and database logs from the Ktor container.

To de-attach from the container and return to the terminal, simply press Ctrl + C.