Tomi Chen

Searching and Displaying Large Datasets with Svelte

By Tomi Chen June 8, 2021

social image

Searching large datasets can be a challenge, especially if you’re rendering the results, and even more so if you’re updating results as the user is typing. Not only is re-rendering thousands of DOM nodes taxing on memory, doing so while the user types can make the text input unresponsive, creating a poor experience. Here, I’ll share my process of working through these issues and improving the user experience.

Background 🔗

These last few days, I’ve been working on TwemojiExplorer, a website that allows easy access to Twitter’s Twemoji emoji set. All the emojis are there, sorted by category, with buttons to copy the SVG, the Unicode codepoint, and a link to one of my favorite websites, Emojipedia. When building websites, I occasionally want to sprinkle in some emojis to spice things up. The problem was that my process was a tad convoluted.

  1. First, I had to go to one of my favorite websites, Emojipedia, to look up the emoji and find the codepoint.
  2. Next, I had to go to Twemoji’s GitHub and search for the proper file.
  3. After I locate the correct emoji file, I can finally copy the SVG.
  4. Repeat for any other emojis I need.

I decided to build a tool to help me, so the target audience of this site is myself. Since I’m lazy and this site primarily for my personal use, it’s not entirely mobile-friendly or keyboard accessible. Improving the mobile and keyboard experience is something I might improve in the future.

UPDATE: I improved the mobile and keyboard experience with a few small tweaks. You can read all about it!

The Problem 🔗

I wanted this website to search all 1,805 emojis as the user is typing. No hitting the Enter key for me! There should be a card for each emoji with buttons to copy the SVG, codepoint, and a link to the corresponding page on one of my favorite websites, Emojipedia. I also wanted to try using Elder.js, a static site generator using Svelte.

I’m not going to be implementing a search library because that’s another project for another time, but I’ll be integrating one and (trying) to optimize the project.

Attempt #1: Fuse.js 🔗

My first attempt was using Fuse.js for the search. While I didn’t need fuzzy searching, we were planning on using Fuse for another project, so I wanted to try it out. Using Fuse turned out to be a bad idea, and after typing more than a few characters, it would lag out to oblivion. I do think you could make it work, especially with some of the tricks described in the rest of the article, but I decided to look for another solution.

Another issue was how I was rendering the cards.

{#each emojis as emoji}
  <!-- $displayItems is a Svelte store containing an array of search results -->
  {#if !displayItems || $displayItems.find((obj) => obj.slug === emoji.slug)}
    <Card {emoji} />
  {/if}
{/each}

Not only was I iterating through each of the emojis, but I was also iterating through the entire search result each time. The performance of these two things combined was so unusable I didn’t even commit it to git. For more information on JavaScript’s Array.find(), see MDN.

Attempt #2: FlexSearch 🔗

I searched around for lightweight, maintained, and performant libraries. I could’ve gotten away with not having those first two since I’m already shipping a ton of data to the client and I’m not going to be proactively maintaining this application anyway, but it’s just a habit.

I stumbled upon FlexSearch, which checked all of my boxes, even if the documentation is a little confusing. To be honest, I’m not sure if I’m configuring it correctly, but it seems to work fine. I also updated how I was rendering the cards.

<!-- See https://github.com/tctree333/TwemojiExplorer/blob/dcfc640550/src/components/SearchBar.svelte -->
<script>
  // ...
  displayItems.set(
    res.reduce((acc, curr) => {
      acc[curr.emoji] = true;
      return acc;
    }, {})
  );
</script>

{#each emojis as emoji}
  <!-- $displayItems is a Svelte store containing an object of {emoji: boolean} -->
  {#if !$displayItems || $displayItems[emoji.emoji] !== undefined}
    <Card {emoji} />
  {/if}
{/each}

Now, I was only iterating through all the emojis once, transforming the data, and accessing an object, which I believe is faster. Still not great, but it was (at least) usable. The biggest issue was when deleting the last character in the search bar, which was super slow. It would take up to 2 whole seconds after hitting the delete key for the character to be deleted, which doesn’t sound like much but feels super slow. This performance hit is because it’s going from rendering a subset of all the cards to rendering them all.

To optimize this, I switched to iterating over the search results to reduce processing. Iterating the search results also allowed me to count the number of emojis in each category and hide categories with no emojis that match the search. I modified the search results to return a list of objects representing each category. The object contains a list of emojis. I can then iterate through both lists to render all the emoji cards.

Improvement Attempt: Creating a FlexSearch index for each emoji category 🔗

Looking back, this optimization was probably unnecessary since I don’t think the bottleneck is in the search. I’m also searching all the categories anyway, which is not I/O bound, so using Promise.all to run them in “parallel” doesn’t boost performance. However, it did require some data restructuring, which at least reduced client-side transformations.

See the commit on GitHub. For more information on using asynchronous JavaScript functions in sequence or parallel, see this blog by James Sinclair.

This change didn’t noticeably improve anything, but it didn’t seem to hurt either, so I just left it in.

Virtualizing the not virtual DOM 🔗

I realize that the bottleneck here was the rendering of the cards, so I thought I could use a virtualization library to avoid rendering cards that aren’t in the viewport.

Virtualization works on the idea that if the user can’t see the element, there’s no reason it should be in the DOM. This strategy allows efficient rendering of lists that can have millions of items.

However, two things make this tricky for this particular project:

  1. The grid of items is dynamic and changes depending on the search results.
  2. I want to scroll on the entire page, not in a box.

I tried out a couple of different Svelte virtualization libraries, but none of them worked the way I wanted. I also tried writing my own using IntersectionObserver, but it was super glitchy and wack. While virtualization is a powerful technique, I haven’t found the proper implementation for this particular situation.

Poor Man’s Virtualization: The “Show More” button 🔗

Since I was not able to implement virtualization, I decided to cheat and hide the elements anyway. The primary method for looking for emojis is through the search bar, so scrolling through all the emojis isn’t something you’re going to do, especially if you’re only partially through a search. I decided to limit the number of emojis initially displayed in each category to 23 (3 rows of 8 at max-width, without the Show More button).

See the commit on GitHub.

This solution is a little more aggressive but works great for my situation. With this optimization, the delay between hitting the delete key on the last character and the delete action is much shorter.

<div class="grid">
  <!-- Initial emoji display -->
  {#each emojis.slice(0, showDefault) as emoji}
    <Card {emoji} />
  {/each}
  <!-- Check if there are emojis left -->
  {#if emojis.slice(showDefault).length > 0}
    {#if showAll}
      <!-- Show them -->
      {#each emojis.slice(showDefault) as emoji}
        <Card {emoji} />
      {/each}
    {:else}
      <!-- Display the Show More button -->
      <button
        on:click={() => {
          showAll = true;
        }}>
        <p>Show All</p>
      </button>
    {/if}
  {/if}
</div>

This solution is a little more aggressive, but works great for my situation. With this optimization, the delay between hitting the delete key on the last character and it actually being deleted is much shorter.

Debouncing the Input 🔗

Even with all these optimizations, there was still a little bit of a delay. The solution? Debouncing! Debouncing is used to prevent burst events from triggering a function too many times. For example, if you’re listening to mouse move events, moving the mouse can spam a burst of event triggers. Debouncing will reduce the number of function calls by triggering the callback after a delay period of no activity. Bouncing also happens in hardware, where a switch activation can bounce the metal contacts inside, creating multiple signals. If you’re interested in hardware switch debouncing, see Ben Eater’s video on the 555 timer.

For my project, I took inspiration from Josh W Comeau’s debouncing snippet, which uses a timeout to debounce events. Whenever the input box is modified, a timeout is started, which executes the search after 25 milliseconds. If another keypress occurs within those 25 milliseconds, the timeout restarts. I played around with a couple of values, and 25 milliseconds seemed to work fine for both responsiveness and performance. However, you may want to use a longer duration to execute the search when the user completely stops typing, since 25 milliseconds feels like it’s trying to be real-time but is lagging.

See the commit on GitHub.

<script>
  // ...
  let debounceTimeout;
  let searchstring;

  function handleInput() {
    clearTimeout(debounceTimeout);
    debounceTimeout = setTimeout(() => {
      searchstring ? search(searchstring, setResult) : setResult(null);
    }, 25);
  }
</script>
<!-- ... -->
<input type="text" bind:value={searchstring} on:input={handleInput} placeholder="Search Emojis!" />

Conclusion 🔗

Searching and rendering large datasets can be difficult, but hopefully, this article gave you some ideas on improving the user experience in your projects. Building this project was a lot of fun, and I hope it’ll be a valuable tool in my toolbox. I actually used the website itself during the building of the website :0 🤯

PREV POST NEXT POST