Ask HN: What are some cool but obscure data structures you know about?

skrebbel · on July 22, 2022

Not a very deep CS-y one, but still one of my favourite data structures: Promise Maps.

It only works in languages where promises/futures/tasks are a first-class citizen. Eg JavaScript.

When caching the result of an expensive computation or a network call, don't actually cache the result, but cache the promise that awaits the result. Ie don't make a

    Map<Key, Result>

but a

    Map<Key, Promise<Result>>

This way, if a new, uncached key gets requested twice in rapid succession, ie faster than the computation takes, you avoid computing/fetching the same value twice. This trick works because:

- promises hold on to their result value indefinitely (until they're GC'ed)

- you can await (or .then()) an existing promise as many times as you want

- awaiting an already-resolved promise is a very low-overhead operation.

In other words, the promise acts as a mutex around the computation, and the resulting code is understandable even by people unfamiliar with mutexes, locks and so on.

jtwigg · on July 22, 2022

FYI, if you plan to do this in ASP netcore, combine with AsyncLazy for the most optimal results https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/b...

alexchro93 · on July 22, 2022

Neat page from David Fowler. I didn't know about that one. Thank you.

To further explain why you want to use an AsyncLazy (or Lazy for synchronous programming) when working with ConcurrentDictionary I find this article to be helpful: https://andrewlock.net/making-getoradd-on-concurrentdictiona...

dan-robertson · on July 22, 2022

Be careful with this data structure. If the language allows async exceptions or you have a big where the promise won’t become deferred, there are a lot of edge cases. Examples of edge cases:

- if the promise never becomes determined (eg bug, async exception) your app will wait forever

- if the promise has high tail latency things can be bad

- if the language eagerly binds on determined promises (ie it doesn’t schedule your .then function) you can get weird semantic bugs.

- changing the keys of the table incrementally instead of just using for memoisation can be really messy

hinkley · on July 22, 2022

I have a couple pieces of code where we had to add rate limiting because it was just too easy to make too many async calls all at once, and things only got 'worse' as I fixed performance issues in the traversal code that were creating an ersatz backpressure situation.

Philosophically, the main problem is that promise caches are a primary enabler of the whole cache anti-pattern situation. People make the mistake of thinking that 'dynamic programming' means memoization and 'memoization' means caching. "Everything you said is wrong." Trying to explain this to people has been a recurring challenge, because it's such a common, shallow but strongly-held misunderstanding.

Youtube recently suggested this video to me, which does a pretty good job of explaining beginning and intermediate DP:

https://www.youtube.com/watch?v=oBt53YbR9Kk

What I love most about this video is that it introduces memoization about 10 minutes in, but by 20% of the way through it has already abandoned it for something better: tabulation. Tabulation is an interative traversal of the problem that gathers the important data not only in one place but within a single stack frame. It telegraphs a information architecture, while recursive calls represent emergent behavior. The reliance on cache to function not only obscures the information architecture, it does so by introducing global shared state. Global shared state and emergent behavior are both poison to the long-term health of a project, and here we have a single data structure that represents both in spades.

We are supposed to be fighting accidental coupling, and especially global variables, not engaging in word games to hide what we are doing.

So I think my answer to OP's question is 'tabulation', which is part data structure and part algorithm.

wxnx · on July 22, 2022

I wrote some Python code recently that uses a similar data structure (Futures instead of Promises, without knowing necessarily about the data structure's use in JavaScript). It wasn't really for caching.

I modeled mine on the Django ASGI reference library's server implementation, which uses the data structure for maintaining references to stateful event consumers. Exception handling is done with a pre-scheduled long-running coroutine that looks at the map.

I'm curious about your second point -- why exactly do things get bad with high tail latency? Is it only a weakness of the data structure when used for caching? I'm having trouble picturing that.

dan-robertson · on July 23, 2022

Suppose the call to evaluate has a p95 of 5 seconds (this is very large of course). If your first call to compute a value hits it, all the subsequent requests for that cell are blocked for 5s. If you didn’t try to cache, only one might block for 5s and the rest could go through fast. On the other hand if you do 20 requests then about one of them will get the p95 latency.

bloudermilk · on July 22, 2022

Best practice in my experience is to use a timeout on all async operations to handle edge cases 1 & 2 above. The third case isn't possible with JavaScript AFAIK.

feoren · on July 22, 2022

- if the language eagerly binds on determined promises (ie it doesn’t schedule your .then function) you can get weird semantic bugs.

What would be an example of this? If the promise has been determined, why not just immediately run the .then function?

afiori · on July 23, 2022

The reason not to do it is to limit the non-determinism of the order of execution.

In Javascript

    f1();
    p.then(f3);
    f2();

functions are always called in the relative order f1, f2, f3 and never f1, f3 , f2 even if p had already resolved.

Shorel · on July 22, 2022

I know this as memoization with lazy evaluation, nothing new, but at the same time very useful, and I would argue, it is very CS.

hn_throwaway_99 · on July 22, 2022

not really quite lazy evaluation, thought, at least in Javascript. Promises begin execution as soon as they are created and there is no way to delay that.

marcosdumay · on July 22, 2022

Promises are an implementation of lazy evaluation for Javascript. This is exactly lazy evaluation.

By the way, lazy evaluation is the one that offers no guarantees about when your code will be executed. If you can delay the execution, it's not lazy.

anyfoo · on July 22, 2022

Wait. I thought lazy evaluation is defined as evaluation at the time a value is needed. After that I think it will then be in Weak Head Normal Form, which can be thought of as "evaluated at its top level"... but I'm a bit rusty. Basically, an expression gets used somewhere (e.g. pattern matched, in the ADT sense). If it's in WHNF, cool, it's evaluated already (subexpressions within may not yet, but that's their problem--that's exactly what WHNF says). If not, we evaluate it down to WHNF.

Wikipedia states it as: "When using delayed evaluation, an expression is not evaluated as soon as it gets bound to a variable, but when the evaluator is forced to produce the expression's value."

So if evaluation begins effectively at the time the promise is issued, not at the time the promise is awaited, then I would not call that lazy evaluation, and what hn_throwaway_99 said sounds correct to me.

Is my rusty old PL understanding wrong?

Sohcahtoa82 · on July 22, 2022

> Wait. I thought lazy evaluation is defined as evaluation at the time a value is needed.

Correct.

You've got it right, GP comment has it wrong.

Lazy does not simply mean "evaluated later". From what I understand, in JS, if you call an async function without `await`, it essentially gets added to a queue of functions to execute. Once the calling function stack completes, the async function will be executed.

A queue of functions to execute does not constitute "lazy evaluation" on its own.

cstrahan · on July 22, 2022

Definition on Wikipedia says otherwise:

“In programming language theory, lazy evaluation, or call-by-need,[1] is an evaluation strategy which delays the evaluation of an expression until its value is needed (non-strict evaluation) and which also avoids repeated evaluations (sharing).”

Haskell would be an apt example here.

9935c101ab17a66 · on July 23, 2022

Promises in javascript are absolutely not considered 'an implementation of lazy evaluation.' Promises are invoked eagerly in javascript.

sethammons · on July 22, 2022

I've always called it request piggy backing (atop memoization). Our db abstraction at my last gig absolutely required it for scaling purposes. Great tool for the tool belt.

deltasevennine · on July 22, 2022

Can someone explain to me why the second example is better.

To me it seems to be the same thing. Replace result with int and I literally do not see a problem with the first one.

Also why is a mutex or lock needed for Result in javascript? As far as I know... In a single threaded application, mutexes and locks are only needed for memory operations on two or more results.

With a single value, say an int, within javascript you are completely safe in terms of concurrent access. Only 2 more more variables can cause a race condition. Or am I wrong here?

--edit:

Thanks for the replies. I see now. The mutex concept is not around memory access or memory safety. It's solely around the computation itself. When the Promise is "pending". It has nothing to do with safety. The parent did mention this, but I completely glossed over it.

JimDabell · on July 22, 2022

You can put the promise into the cache immediately but you can only put the result from the promise into the cache once the promise resolves. So if an identical request comes in a second time before the promise has been resolved, then if you are caching the promise you have a cache hit but if you are caching the result then you have a cache miss and you end up doing the work twice.

epolanski · on July 22, 2022

I am still not understanding the purpose of this as I believe it is grounded on the wrong assumption.

Pretty much every single asynchronous operation other than some `Promise.resolve(foo)` where foo is a static value can fail. Reading from the file system, calling an api, connecting to some database, etc.

If the original promise fails you're gonna return a cached failure.

Mind you, I'm not stating this might be completely useless, but at the end of the day you will be forced to add code logic to check all of those asynchronous computation results which will eventually outweight the cost of only saving the resolved data.

lucideer · on July 22, 2022

> If the original promise fails you're gonna return a cached failure.

This is usually a good thing, and even in cases where it isn't, it's often a worthwhile trade-off.

In the common case a failure will either be persistent, or - if load/traffic related - will benefit from a reduction in requests (waiting a while to try again). In both of these cases, where your first key request fails, you want the immediately-following cases to fail fast: "caching" the promise caches the failure for the duration of one request (presumably the cache is emptied once the promise resolves, allowing subsequent key accesses to retry).

The less common case where the above isn't true is where you have very unstable key access (frequent one-off failures). In those cases you might want a cache miss on the second key request, but successful key retrieval usually isn't as critical in such systems which makes the trade off very worthwhile.

zamalek · on July 22, 2022

> If the original promise fails you're gonna return a cached failure.

In the scenario where that's an issue, you would need to add (extremely trivial 3-5 lines) logic to handle retrying a cached failure. The underlying data structure would continue to be a promise map.

dragonwriter · on July 22, 2022

> If the original promise fails you're gonna return a cached failure.

In many cases, another near-in-time request would also fail, so returning a cached failure rather than failing separately is probably a good idea (if you need retry logic, you do it within the promise, and you still need only a single instance.)

(If you are in a system with async and parallel computation both available, you can also use this for expensive to compute pure functions of the key.)

cerved · on July 22, 2022

You do one IO operation instead of two.

It's unlikely that always doing two is going to be more successful than just trying once and dealing with failure

deltasevennine · on July 22, 2022

It's a tiny optimization.

When the VERY FIRST ASYNC operation is inflight the cache is immediately loaded with a Promise, which blocks all other calls while the FIRST async operation is in flight. This is only relevant to the very FIRST call. That's it. After that the promise is pointless.

As for the Promise failure you can just think of that as equivalent of the value not existing in the cache. The logic should interpret the promise this way.

lightbendover · on July 22, 2022

It's not always a tiny optimization. If you have an expensive query or operation, this prevents potentially many duplicative calls.

A practical example of this was an analytics dashboard I was working on years ago -- the UI would trigger a few waves of requests as parts of it loaded (batching was used, but would not be used across the entire page). It was likely that for a given load, there would be four or more of these requests in-flight at once. Each request needed the result of an expensive (~3s+) query and computation. Promise caching allowed operations past the first to trivially reuse the cached promise.

There are certainly other approaches that can be taken, but this works very well as a mostly-drop-in replacement for a traditional cache that shouldn't cause much of a ripple to the codebase.

zbentley · on July 22, 2022

Yep, I've been bitten by exactly this failure mode.

You have to invalidate/bust the cache when a failure is returned (which is racy, but since cache busting is on the sad path it's a fine place to protect with a plain ol' mutex, assuming you even have real parallelism/preemption in the mix).

Alternatively you can cache promises to functions that will never return failure and will instead internally retry until they succeed. This approach generalizes less well to arbitrary promises, but is more friendly to implementing the custom back off/retry scheme of your choice.

balls187 · on July 22, 2022

It's not the cost of saving the resolved data.

If I understand the pattern correctly, it is to avoid multiple asynchronous requests to a resource that has yet to be cached.

gorjusborg · on July 22, 2022

Yeah, that's my understanding too.

It seems like an optimization to prevent lots of downstream requests that occur in rapid succession before the first request would have finished. I'd also suspect that the entry would be removed from the map on failure.

diputsmonro · on July 22, 2022

I believe the idea is that if you stored the result instead, you would have to wait for the promise to resolve the first time. If you made two requests in quick succession, the second request wouldn't see anything in the result map for the first one (as it hasn't resolved yet) and would then need to make its own new request.

If you store the promise, then not only does the second attempt know that it doesn't need to make a new request, but it can also await on the promise it finds in the map directly.

deltasevennine · on July 22, 2022

Wait a minute.

Under Javascript, The behavior induced by this pattern is EXACTLY equivalent to that of CACHING the result directly and a BLOCKING call. The pattern is completely pointless if you view it from that angle. Might as well use blocking calls and a regular cache rather then async calls and cached promises.

So then the benefit of this pattern is essentially it's a way for you to turn your async functions into cached non-async functions.

Is this general intuition correct?

--edit: It's not correct. Different async functions will still be in parallel. It's only all blocking and serial under the multiple calls to the SAME function.

ohgodplsno · on July 22, 2022

Say you have three elements in your UI that need to fetch some user info. You have two major options:

1/ Have a higher level source of truth that will fetch it once from your repository (how does it know it needs to fetch? How is cache handled ?) and distribute it to those three elements. Complex, makes components more independent but also more dumb. It's fine to have pure elements, but sometimes you just want to write <UserInfoHeader/> and let it handle its stuff.

2/ Your repository keeps this Map<Key, Promise<T>>, and every time you call getUserInfo(), it checks the map at key "userinfo" and either return the promise (which might be ongoing, or already resolved) or see that it's not there and do the call, writing the promise back into the map. This way, your three components can just call getUserInfo() without giving a damn about any other ones. The first one that calls it pre-resolves it for others.

As to why a promise instead of just the raw result: one can potentially return null (and you need to call again later to refresh, or straight up blocks during the entire call), the other one just gives you back promises and you can just listen to them and update your UI whenever it's ready (which might be in 5 seconds, or right now because the promise has already been resolved)

It's a bad implementation of a cached repository (because it ignores TTL and invalidation as well as many problems that need to be handled) that any junior developer could figure out (so it's everything but obscure), but sometimes, eh, you don't need much more.

Marazan · on July 22, 2022

Because the computation of Result is _expensive_ and _async_.

So if you get two requests for the computation that computes Result in close succession the Map doesn't have the result in place for the second call and as a result you get no caching effect and make 2 expensive requests.

By caching the Promise instead the second request is caught by the cache and simply awaits the Promise resolving so now you only get 1 expensive request.

Cthulhu_ · on July 22, 2022

I didn't know this was a formal design pattern; I do recall having implemented this myself once, as a means to avoid double requests from different components.

Later on I switched to using react-query which has something like this built-in, it's been really good.

polyrand · on July 22, 2022

This is only tangentially related to:

> you avoid computing/fetching the same value twice

But this problem comes up in reverse proxies too. In Nginx, you can set the `proxy_cache_lock`[0] directive to achieve the same effect (avoiding double requests).

[0]: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#pr...

talos2110 · on July 22, 2022

That right there is Cache Stampede[0] prevention.

[0]: https://en.wikipedia.org/wiki/Cache_stampede

psytrx · on July 22, 2022

Interesting! This is an issue I had with an external API which I intended to cache on my serverless workers infra.

I hit the API's rate limiter when the workers invalidated their cache, even though I staggered the lifetime of the cache keys for each replicated instance. This is how I found out how many Cloudflare workers run on a single edge instance. Hint: It's many.

Didn't know it had a name. I'm delighted, thanks!

talos2110 · on July 22, 2022

You’re welcome, happy to help. If you are in the .NET space I suggest you to take a look at FusionCache. It has cache stampede protection built in, plus some other nice features like a fail-safe mechanism and soft/hard timeouts https://github.com/jodydonetti/ZiggyCreatures.FusionCache

wingspan · on July 22, 2022

I love this approach and have used it many times in JavaScript. I often end up adding an additional map in front with the resolved values to check first, because awaiting or then'ing a Promise always means you will wait until the next microtask for the value, instead of getting it immediately. With a framework like React, this means you'll have a flash of missing content even when it is already cached.

pflanze · on July 22, 2022

Replacing the promise with the result in the map (using a sum type as the map value type) might be more efficient than maintaining two maps?

skrebbel · on July 22, 2022

This surprises me. I had expected the microtask to be executed right after the current one, ie before any layouting etc - isnt that the whole point the "micro" aspect?

wingspan · on July 22, 2022

I was able to reproduce this in a simple example [1]. If you refresh it a few times you will be able to see it flash (at least I did in Chrome on Mac). It is probably more noticeable if you set it up as an SPA with a page transition, but I wanted to keep the example simple.

[1] https://codesandbox.io/s/recursing-glitter-h4c83u?file=/src/...

skrebbel · on July 22, 2022

I am deeply disturbed and confused. Do you understand why this happens?

chmod775 · on July 22, 2022

The event loop shouldn't continue the circle until the microtask queue is empty, from what I remember.

Browsers probably used to squeeze layouting and such things in when the event loop completed an iteration or became idle. However nowadays it's entirely possible that have, or will have, browsers that do even work in parallel while the javascript event loop is still running, possibly even beginning to paint while javascript is still running synchronously (haven't seen that yet).

I've seen instances where browsers seem to be like "nah, I'll render this part later" and only would render some parts of the page, but certain layers (scrolling containers) would not be filled on the first paint - despite everything being changed in one synchronous operation!* And I've seen them do that at least 5 years ago.

It seems to me the situation is complicated.

* There was a loading overlay and a list being filled with data (possibly thousands of elements). The loading overlay would be removed after the list was filled, revealing it. What often happened was that the overlay would disappear, but the list still looking empty for a frame or two.

asadawadia · on July 22, 2022

Yeah, saves you from thundering herd problems

https://github.com/ben-manes/caffeine cache does something similar by caching the future that gets returned on look-ups if the value is still being computed

MobiusHorizons · on July 22, 2022

I have a similar pattern when using an rxjs pipeline to do async tasks in response to changes in other values. The typical pattern would be

    readonly foo = bar.pipe(
        switchMap(v => doTask(v)),
        shareReplay(1),
    );

Where doTask returns a promise or an AsyncSubject. The shareRelpay allows foo to cache the last value so any late subscriber will get the most recent value. This has a problem of returning cached values while new work is happening (eg while a network request is in progress) so instead I like to write

    readonly foo = bar.pipe(
        map(v => doTask(v),
        shareReplay(1),
        switchMap(v => v),
    );

This way the shareReplay is caching the promise instead of the result. With this pattern I can interact with this pipeline from code like so

    async someEventHandler(userInput) {
        this.bar.next(userInput);
        const result = await firstValueFrom(this.foo);
        …
    }

Without the pattern, this code would get the old result if there was ever any cached value

idontwantthis · on July 22, 2022

It’s also the clearest and least buggy way to iterate over the results. Map over Await Promise.all(map).

unsupp0rted · on July 22, 2022

Partly off-topic, but in JS I tend to avoid iterating over promises with maps because it's always confusing what will/won't work, so I use a basic FOR loop because async/await works in there.

How bad is this? Should I switch to Promise.all instead?

mromnia · on July 22, 2022

Iterating with async/await means you wait for every result before making another call. You basically execute the async calls one by one.

Promise.all runs them all simultaneously and waits until they are all complete. It returns an array of results in the same order as the calls, so it's usually pretty straightforward to use.

Both approaches have a purpose, so it's not like you "should" strictly use one. But you should be aware of both and be able to use the one that fits the need.

dmix · on July 22, 2022

The third major strategy being Promise.any():

- as soon as any of them resolve (any)

- when all of them resolve (all)

- resolve each in order (for)

seedboot · on July 22, 2022

Promise.all/allSettled/etc will allow you to resolve your promises concurrently rather than awaiting each result in the for loop. Depends what your use case is, really.

idontwantthis · on July 23, 2022

Use my approach or yours, but never use forEach with await. I don’t remember exactly why, but it just doesn’t work and will lead to very confusing errors.

meowtimemania · on July 23, 2022

Because Array.forEach doesn’t return a value or anything that could be awaited. If you use Array.map you can create an Array of promises that can be awaited with Promise.all

moonchrome · on July 22, 2022

Just make sure Map access is thread safe - awaiting promises is - but updating/reading map keys usually isn't.

skrebbel · on July 22, 2022

Yep, in multithreaded langs you just want to use a ConcurrentMap or whatever the stdlib has for this. The write is very brief because you only write the promise and not eg keep a proper lock on the key until the result has arrived, so no need for a fancier datastructure.

unrealhoang · on July 22, 2022

js is single-threaded so no problem there.

moonchrome · on July 22, 2022

Since OP mentioned Mutexes I assumed he's dealing with multithreaded code, but with JS this works as advertised :)

abuckenheimer · on July 22, 2022

This is clever, otherwise you have to coordinate your cache setter and your async function call between potentially many concurrent calls. There are ways to do this coordination though, one implementation I've borrowed in practice in python is modes stampede decorator: https://github.com/ask/mode/blob/master/mode/utils/futures.p...

jchw · on July 23, 2022

I believe that any time you await/then you wind up going back to the event loop. It's cheap ish but definitely not free. This is, of course, to make the behavior more predictable, but in this case it might be undesirable. Unfortunately, I don't think there's any obvious way around it. It can sorta be done with vanilla callbacks, but that takes you out of the async/promises world.

ralusek · on July 22, 2022

I have written many tools that do something like this, very useful.

Just gotta be careful with memory leaks, so (for a TS example) you might want to do something like this:

    const promiseMap<key, Promise<any>> = new Map();

    async function keyedDebounce<R>(key: string, fn: () => R) {
      const existingPromise = promiseMap.get(key);
      if (existingPromise) return existingPromise;

      const promise = new Promise(async (resolve) => {
        const result = await fn(); // ignore lack of error handling

        // avoid memory leak by cleaning up
        promiseMap.delete(key); 

        // this will call the .then or awaited promises everywhere
        resolve(result); 
      });

      promiseMap.set(key, promise);

      return promise;
    }

So that the promise sticks around for that key while the function is active, but then it clears it so that you're not just building up keys in a map.

lolinder · on July 22, 2022

This is a great solution to the thundering herd problem, but isn't OP explicitly trying to cache the results for later? In that case, the promise map is a clever way to make sure the cachable value is fetched at most once, and you want the keys to build up in the map, so GC'ing them is counterproductive.

ralusek · on July 22, 2022

Ya it obviously depends on the intended goal, but caching items in a map indefinitely better be done on a set of values that is known to be limited to a certain size over the lifetime of the server or it'll lead to a memory leak.

kevinforrestcon · on July 22, 2022

If the result of the async function is not necessarily the same value every time you call it, you may want to recompute it

blue_pants · on July 22, 2022

Please correct me if I'm wrong. But you'd better avoid creating extra promises for performance reasons

  const promiseMap<key, Promise<any>> = new Map();

  async function keyedDebounce<R>(key: string, fn: () => R) {
    const existingPromise = promiseMap.get(key);
    if (existingPromise) return existingPromise;
  
    const promise = fn().then(result => {
      promiseMap.delete(key);
  
      return result;
    });
  
    promiseMap.set(key, promise);
  
    return promise;
  }

jdthedisciple · on July 22, 2022

You check for if(promiseMap.get(key)) and in the NO case you do promiseMap.delete(key)?

Could you explain why that's necessary? (sorry probs a stupid question)

ralusek · on July 22, 2022

So if the promise still exists, that means that there is an active call out for a promise using this key already, so we'll just return it. We know this because when the function that is executing finishes, we delete the promise from the map.

A purpose of this might be, let's say you rebuild some structure very frequently, like hundreds of times per second or whatever. You can call this function for a given key as many times as you want while that async process is executing, and it will only execute it once, always returning the same promise.

aidos · on July 22, 2022

I’m the no case, the promise is created, and then deleted after the function is resolved.

I’m not entirely sure of the use of this pattern where your using a map and deleting the keys as you go - seems like you’d end up doing more work kinda randomly depending on how the keys were added. I’d just relive the whole map when I was done with it.

withinboredom · on July 22, 2022

Won’t JS garbage collect orphaned references? Why is this necessary?

deckard1 · on July 22, 2022

there is some confusion here.

OP is intentionally caching results of, presumably, expensive computations or network calls. It's a cache. There is no memory leak. OP just did not detail how cache invalidation/replacement happens. The 2nd comment adds a rather rudimentary mechanism of removing items from cache as soon as they are ready. You get the benefit of batching requests that occur before the resolve, but you get the downside of requests coming in right after the resolve hitting the expensive process again. Requests don't neatly line up at the door and stop coming in as soon as your database query returns.

Typically you would use an LRU mechanism. All caches (memcached, redis, etc.) have a memory limit (whether fixed or unbounded, i.e. all RAM) and a cache replacement policy.

stickupkid · on July 22, 2022

It does, but the map has a reference to it, so it will "leak" (in gc languages an unwanted reference is considered a leak). If this map got rather large, you could end up with a rather large heap and it would be un-obvious why at first.

withinboredom · on July 22, 2022

Do Promises hold a reference to the chain of functions that end in the result? If so, that seems like a bug.

epolanski · on July 22, 2022

A promise is just an object, it does not contain any reference to the chained functions.

andrewingram · on July 22, 2022

Could you use a WeakMap for this instead?

skrebbel · on July 22, 2022

If the key is a type that you expect to be GC'ed, yes, totally. If the key is a simple value type eg a string then it behaves the same as a regular Map or an object.

andrewingram · on July 22, 2022

Ah yeah, good point.

I've mostly used this overall pattern with something like Dataloader, where you throw away the entire cache on every request, so the GC problem doesn't crop up.

fabiancook · on July 23, 2022

https://github.com/tc39/proposal-function-memo

This might be relevant to this exact pattern

  const getThing = (async function (arg1, arg2) {
    console.log({ arg1, arg2 });
    return arg1 + (arg2 || 0);
  }).memo();

  console.log(await getThing(1)); // New promise
  console.log(await getThing(2)); // New promise
  console.log(await getThing(1)); // Returns promise from earlier

nurettin · on July 23, 2022

>> Not a very deep CS-y one

yes, in fact this is more like a pattern than an actual data structure, since you might as well replace it with a list of futures. And a comparison between future based concurrency and thread based parallelism is like apples to oranges.

But it isn't surprising that this pattern ended up at the top since it is what the users of this site would be most familiar with.

philsnow · on July 23, 2022

Vitess does something similar, "query coalescing", which was designed for loads that are very read-heavy. If 1000 requests come in that end up making the same read query from the database, and they all came in in the same 50ms that it takes to get the results from the database, you can just return the same answer to all of them.

bschaeffer · on July 22, 2022

I wrote a version of this with Elixir: https://github.com/bschaeffer/til/tree/master/elixir/gen_ser...

Didn't know what to call it but PromiseMaps is nice name for it.

lightbendover · on July 22, 2022

Promise Caching has a nicer name to it since that's what it is if well-implemented.

danellis · on July 23, 2022

> It only works in languages where promises/futures/tasks are a first-class citizen. Eg JavaScript.

Why would that be true?

jchw · on July 23, 2022

I think the actual answer is that it only works in languages where you can eagerly execute async code disconnected from the Promise/Future construct. It would've worked in ES3 using a Promise polyfill. (Which is notable because it does not fit the aforementioned criteria—it had no real async support.)

hinkley · on July 23, 2022

It isn’t. The first implementation of promises I worked on was in Java. We used it as part of verifying certificate chains. If you use threads it doesn’t matter if the operations are blocking, as long as you move them off-thread.

yla92 · on July 22, 2022

Anybody knows if there is similar pattern for Python's asyncio?

abrichr · on July 22, 2022

How about a dict whose values are asyncio.Future?

https://docs.python.org/3/library/asyncio-future.html#asynci...

ActorNightly · on July 22, 2022

https://docs.python.org/3.9/library/asyncio-future.html#asyn...

acoye · on July 22, 2022

I wonder if Swift's AsyncAwait could be used in such a way.

Ryuuke5 · on July 22, 2022

Sure it's possible, we need to await for the Task if it already exists on the dictionary, for example we could imagine writing something like this inside an actor to make it threadSafe.

    private var tasksDictionary: Dictionary<String, Task<Data, Error>>

    func getData(at urlString: String) async throws -> Data {
        if let currentTask = tasksDictionary[urlString] {
            return await currentTask.value
        }
        let currentTask = Task { 
           return try await URLSession.shared.data(from: URL(string:urlString)!) 
        }
        tasksDictionary[urlString] = currentTask
        let result = await currentTask.value
        tasksDictionary[urlString] = nil
        return result
    }

ninkendo · on July 22, 2022

Hmm, why set `taskDictionary[urlString] = nil` at the bottom there? If the whole point is to cache the result, isn't the point to leave the value there for other requests to pick it up?

Ryuuke5 · on July 22, 2022

Yup, nice catch, no need to reset it to nil if you wanna keep it in memory indefinitely.

I guess making it nil can be also used if you don't wanna make the same request when there is one already in flight, in case you have a 403 error and need to request a refresh token, you don't wanna make two simultaneous requests, but you also don't wanna catch it indefinitely either.

afro88 · on July 22, 2022

Great! So much nicer than the Combine (or Rx) version.

mperreux · on July 22, 2022

This is great. I'm actually running into this problem but across distributed clients. Is there a distributed way to achieve somthing like this with say Redis?

jawilson · on July 22, 2022

Technically yes, the promise can first do an RPC to a distributed key/value store, and only then do the expensive computation (which typically is itself an RPC to some other system, perhaps a traditional database). Promise libraries should make this pretty simple to express (though always uglier than blocking calls).

I say expensive because even in a data center, an RPC is going to take on the order of 1ms. The round-trip from a client like a browser is much greater especially where the client (if this is what you meant by "client") has a poor internet connection.

Therefore this pattern is usually done inside of an API server in the data-center. Additional benefits of the server side approach is that an API server can better control the cache keys (you could put the API server "build number" in the cache key so you can do rolling updates of your jobs), TTL, etc. Also, you don't want a rogue computer you don't control to directly talk to a shared cache since a rogue computer could put dangerous values into your cache, remove elements from the cache they shouldn't, etc.

lalaithion · on July 22, 2022

What do you mean by first class citizen, here? I’m pretty sure this works in all languages with promises, but I might be misunderstanding something.

studifi · on July 23, 2022

The data structure here is a Map<T>

hmm-interesting · on July 22, 2022

can you please explain this a bit more how this avoid fetching the same value twice? maybe with example.

kube-system · on July 22, 2022

It sounds like, if you have a process that takes a long time to do, you check to see if you already have initiated that process (see if the key is in the list of promises), rather than firing off another promise for the same request.

If you get 10 requests to fire function lookUpSomethingThatTakesAWhile(id) that returns a promise, rather than firing off 10 promises, you fire it off once, look up the ID in your map for the next nine, and return that same promise for the next 9.

cauthon · on July 22, 2022

What is a promise in this context?

cpburns2009 · on July 22, 2022

A promise is an object representing an asynchronous result. Some languages, such as Python, call them futures. A promise object will typically have methods for determining if result is available, getting the result if it is, and registering callback functions to be called with the result when it's available.

cauthon · on July 29, 2022

Thank you!

maest · on July 22, 2022

More generally, I believe this is a monad structure.

pdpi · on July 22, 2022

Promises are not monads, for one simple reason: they're not referentially transparent. The whole point of OP's example is to double down and take advantage of that.

epolanski · on July 22, 2022

Referential transparency is a property of expressions, not type classes. Referential transparency is nowhere in the definition of a monad because a monad is obviously not an expression but a type class.

It is definitely true that Promise is a data type that _could_ admit a monad instance. It has:

- a data type M a which is Promise a

- a function pure with the signature a => M a, which is x => Promise.resolve(x)

- a bind function with the signature M a => a => M b => M b which is essentialy `then`

But a monad requires three properties to hold true: Right identity, Left identity and associativity. Right identity holds for every function `f: a => M b`, but left identity and associativity do not hold.

brewmarche · on July 22, 2022

Maybe comonads fit better

samatman · on July 22, 2022

promise map?

do you mean

    setmetatable({}, {__index = function(tab, key)
          local co = coroutine.running()
          uv.read(fd, function(err, result)
               tab[key] = result
               return coroutine.resume(result)
            end)
          return coroutine.yield()
      end })

This is an incomplete implementation, clearly: but I've never missed Promises with coroutines and libuv.

jiggawatts · on July 22, 2022

Cache-Oblivious Data Structures: https://cs.au.dk/~gerth/MassiveData02/notes/demaine.pdf

A vaguely related notion is that naive analysis of big-O complexity in typical CS texts ignores over the increasing latency/cost of data access as the data size grows. This can't be ignored, no matter how much we would like to hand-wave it away, because physics gets in the way.

A way to think about it is that a CPU core is like a central point with "rings" of data arranged around it in a more-or-less a flat plane. The L1 cache is a tiny ring, then L2 is a bit further out physically and has a larger area, then L3 is even bigger and further away, etc... all the way out to permanent storage that's potentially across the building somewhere in a disk array.

In essence, as data size 'n' grows, the random access time grows as sqrt(n), because that's the radius of the growing circle with area 'n'.

Hence, a lot of algorithms that on paper have identical performance don't in reality, because one of the two may have an extra sqrt(n) factor in there.

This is why streaming and array-based data structures and algorithms tend to be faster than random-access, even if the latter is theoretically more efficient. So for example merge join is faster than nested loop join, even though they have the same performance in theory.

vvanders · on July 22, 2022

Realtime collision detection[1] has a fantastic chapter in this with some really good practical examples if I remember right.

Great book, I used to refer to it as 3D "data structures" book which is very much in theme with this thread.

[1] https://www.amazon.com/Real-Time-Collision-Detection-Interac...

trebcon · on July 22, 2022

The implicit grid data structure from there is a personal favorite of mine. I used it in a game once and it performed incredibly well for our use case.

It's a bit too complicated to totally summarize here, but it uses a bit per object in the scene. Then bit-wise operations are used to perform quick set operations on objects.

This data structure got me generally interested in algorithms that use bits for set operations. I found the Roaring Bitmap github page has a lot of useful information and references wrt this topic: https://github.com/RoaringBitmap/RoaringBitmap

phkahler · on July 22, 2022

I spent a lot of time optimizing an octree for ray tracing. It is my favorite "spatial index" because it's the only one I know that can be dynamically updated and always results in the same structure for given object placement.

thegeomaster · on July 22, 2022

It's a stellar book. I work on a commercial realtime physics engine, the "orange book" is practically required reading here.

SotCodeLaureate · on July 22, 2022

I often recommend this book, RTCD, as a "better intro to data structures and algorithms" (sometimes as "the best"). The advice often falls on deaf ears, unfortunately.

Gene_Parmesan · on July 22, 2022

Seconding the RTCD recommendation. Handy code examples, and my favorite part is that the book is real-world-performance-conscious (hence the "real-time" in the title).

cvoss · on July 22, 2022

> In essence, as data size 'n' grows, the random access time grows as sqrt(n), because that's the radius of the growing circle with area 'n'.

I was about to write a comment suggesting that if we made better use of three dimensional space in constructing our computers and data storage devices, we could get this extra latency factor down to the cube root of n.

But then, I decided to imagine an absurdly large computer. For that, one has to take into account a curious fact in black hole physics: the radius of the event horizon is directly proportional to the mass, rather than, as one might expect, the cube root of the mass [1]. In other words, as your storage medium gets really _really_ big, you also have to spread it out thinner and thinner to keep it from collapsing. No fixed density is safe. So, in fact, I think the extra factor for latency in galactic data sets is neither the square root nor cube root, but n itself!

[1] https://en.wikipedia.org/wiki/Schwarzschild_radius

samus · on July 22, 2022

The main difficulty with growing computing devices in three dimensions is cooling. All that heat has to be somehow carried away. It's already difficult with 2D devices. One would have to incorporate channels for cooling liquids into the design, which takes a bite out of the gains from 3D.

https://www.anandtech.com/show/12454/analyzing-threadripper-...

chriswarbo · on July 22, 2022

This series of blog posts explains this sqrt behaviour quite nicely, covering both theory (including black holes) and practice:

http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

6510 · on July 22, 2022

If its stored far away enough re-doing the calculation is faster. Depending on how much accuracy you need you might as well load a 2022-07-22 07:26:34 earth and have that guy on HN evaluate his thoughts all over again.

l33t2328 · on July 22, 2022

How do you accurately include those sqrt(n)’s in your analysis?

physicsguy · on July 22, 2022

I spent a long time looking at an algorithm for long range force calculations called the Fast Multipole Method which is O(N) and found that practically it couldn't compete against an O(N log N) method we used that involved FFTs because the coefficient was so large that you'd need to simulate systems way bigger than is feasible in order for it to pay off because of cache locality, etc.

jsmith45 · on July 22, 2022

Honestly, that is not too uncommon. For top level algorithm choice analysis, "rounding" O(log(n)) to O(1), and O(n*log(n)) to O(n), is often pretty reasonable for algorithm comparison, even though it is strictly incorrect.

It is not uncommon for a simple O(n*log(n)) algorithm to beat out a more complex O(n) algorithm for realistic data sizes. If you "round" them to both O(n), then picking the simpler algorithm because the obvious choice, and can easily turn out to have better perf.

physicsguy · on July 22, 2022

To be fair, in the general case (particles placed in arbitrary positions), it worked very well. The issue was that in the area of electromagnetics I worked on at the time, we make an assumption about things being placed on a regular grid, and that allows you to do the force calculation by convolving a kernel with the charges/dipole strength in Fourier space.

thisisnotatest · on July 29, 2022

A helpful little heuristic that always stuck with me from a Stanford lecture by Andrew Ng was, "log N is less than 30 for all N".

xigoi · on July 22, 2022

https://en.wikipedia.org/wiki/Galactic_algorithm

mmorse1217 · on July 22, 2022

This implementation and associated paper improve the cache locality to make the algorithm more competitive https://github.com/dmalhotra/pvfmm

They compare solving toy problems with different methods here https://arxiv.org/pdf/1408.6497.pdf

But as you mention, this is just fiddling with constants, and it still may be too costly for your problem

JackFr · on July 22, 2022

I saw an amazing presentation on Purely Functional Data structures with a really approachable and understandable proof on ho w a particular structure was asymptotically constant or linear time (I believe), and then followed up by saying that in practice none of it worked as well as a mutable version because of caching and data locality.

Oh well.

qwrshr · on July 22, 2022

Is there by any chance a publicly available recording of this presentation on purely functional data structures?

JackFr · on July 22, 2022

NE Scala Symposium Feb 2011

http://vimeo.com/20262239 Extreme Cleverness: Functional Data Structures in Scala

or maybe

http://vimeo.com/20293743 The Guerrilla Guide to Pure Functional Programming

(At work, so I can't check the video)

Both of them were excellent speakers

itamarst · on July 22, 2022

Does anyone actually use cache-oblivious data structure in practice? Not, like, "yes I know there's a cache I will write a datastructure for that", that's common, but specifically cache-oblivious data structures?

People mention them a lot but I've never heard anyone say they actually used them.

jiggawatts · on July 22, 2022

To an extent, the B-Tree data structure (and its variants) are cache oblivious. They smoothly improve in performance with more and more cache memory, and can scale down to just megabytes of cache over terabytes of disk.

The issue is that the last tree level tends to break this model because with some caching models it is "all or nothing" and is the biggest chunk of the data by far.

There are workarounds which make it more truly cache oblivious. How often this is used in production software? Who knows...

jfim · on July 22, 2022

There are also Judy arrays: https://en.wikipedia.org/wiki/Judy_array

bruce343434 · on July 22, 2022

This sounded promising but then I saw a performance benchmark [1] that was done on a pentium(!) with a very suboptimal hash table design compared to [2] and [3].

The Judy site itself [4] is full of phrases like "may be out of date", when the entire site was last updated in 2004. If it was already out of date in 2004...

It seems that Judy arrays are just kind of a forgotten datastructure, and it doesn't look promising enough for me to put in effort to revive it. Maybe someone else will?

[1] http://www.nothings.org/computer/judy/

[2] https://probablydance.com/2017/02/26/i-wrote-the-fastest-has...

[3] https://probablydance.com/2018/05/28/a-new-fast-hash-table-i...

[4] http://judy.sourceforge.net/

fanf2 · on July 22, 2022

Judy arrays are (under the covers) a kind of radix tree, so they are comparable to things like ART (adaptive radix tree), HAMT (hashed array-mapped trie), or my qp-trie. Judy arrays had some weird issues with patents and lack of source availability in the early days. I think they are somewhat overcomplicated for what they do.

Regarding cache-friendiness, a neat thing I discovered when implementing my qp-trie is that it worked amazingly well with explicit prefetching. The received wisdom is that prefetching is worthless in linked data structures, but the structure of a qp-trie makes it possible to overlap a memory fetch and calculating the next child, with great speed benefits. Newer CPUs are clever enough to work this out for themselves so the explicit prefetch is less necessary than it was. https://dotat.at/prog/qp/blog-2015-10-11.html

I guess I partly got the idea for prefetching from some of the claims about how Judy arrays work, but I read about them way back in the 1990s at least 15 years before I came up with my qp-trie, and I don't know if Judy arrays actually use or benefit from prefetch.

efreak · on July 22, 2022

It seems to me that the project is still active,: https://sourceforge.net/p/judy/bugs/29/

pdpi · on July 22, 2022

Cache-obliviousness is an important part of the whole array-of-structs vs structs-of-arrays. One of the advantages of the struct-of-arrays strategy is that it is cache-oblivious.

artemisart · on July 22, 2022

Cache oblivious data structures are absolutely used in every serious database.

From what I remember/can find online (don't take this as a reference): B-tree: relational DB, SQLite, Postgres, MySQL, MongoDB. B+-tree: LMDB, filesystems, maybe some relational DBs. LSM trees (Log-structured merge tree, very cool) for high write performance: LevelDB, Bigtable, RocksDB, Cassandra, ScyllaDB, TiKV/TiDB. B-epsilon tree: TokuDB, not sure if it exists anymore. COLA (Cache-oblivious lookahead array): I don't know where it's used.

Maybe modern dict implementations can qualify too, e.g. Python.

zimbatm · on July 23, 2022

Yes, LSM trees are specifically designed with this propery in mind.

robotresearcher · on July 22, 2022

Doing a decent job of analysis would include the sqrt(n) memory access factor. Asymptotic analysis ignores constant factors, not factors that depend on data size. It's not so much 'on paper' as 'I decided not to add a significant factor to my paper model'.

Cache oblivious algorithms attempt to make the memory access term insignificant so you could continue to ignore it. They are super neat.

cratermoon · on July 22, 2022

> the increasing latency/cost of data access as the data size grows

Latency Numbers Everyone Should Know https://static.googleusercontent.com/media/sre.google/en//st...

NavinF · on July 22, 2022

The ratios are mostly correct, but the absolute numbers are wrong these days. For example, memory read latency can be 45ns (not 100) with a decent DDR4 kit and data center latency can be 10us (not 500us) with infiniband.

umanwizard · on July 22, 2022

Do we mean different things by "merge join" and "nested loop join" ? For me "merge join" is O(n) (but requires the data to be sorted by key) whereas "nested loop join" is O(n^2).

mountainreason · on July 22, 2022

Wouldn't you at least be looking at nlog(n) for the sort in the merge join?

umanwizard · on July 22, 2022

Yes, if the data is not already sorted. Thus it's O(n) for already sorted data and O(n log(n) + n) -- which simplifies to O(n log(n)) -- for arbitrary data.

mountainreason · on July 22, 2022

Yeah. I know. Why would the data already be sorted?

umanwizard · on July 22, 2022

In a database you might have already built an index on that data.

robalni · on July 22, 2022

I was experimenting a while ago with something that I think is related to this.

If you have a quadratic algorithm where you need to compare each pair of objects in a large array of objects, you might use a loop in a loop like this:

    for (int i = 0; i < size; i++) {  for (int j = 0; j < size; j++) { do_something(arr[i], arr[j]); }  }

When this algorithm runs, it will access every cache line or memory page in the array for each item, because j goes through the whole array for each i.

I thought that a better algorithm might do all possible comparisons inside the first block of memory before accessing any other memory. I wanted this to work independently of the size of cache lines or pages, so the solution would be that for each power of 2, we access all items in a position lower than that power of 2 before moving to the second half of the next power of 2.

This means that we need to first compare indexes (0,0) and then (1,0),(1,1),(0,1) and then all pairs of {0,1,2,3} that haven't yet been compared and so on.

It turns out that the sequence (0,0),(1,0),(1,1),(0,1) that we went through (bottom-left, bottom-right, top-right, top-left in a coordinate system) can be recursively repeated by assuming that the whole sequence that we have done so far is the first item in the next iteration of the same sequence. So the next step is to do (0,0),(1,0),(1,1),(0,1) again (but backwards this time) starting on the (2,0) block so it becomes (2,1),(3,1),(3,0),(2,0) and then we do it again starting on the (2,2) block and then on the (0,2) block. Now we have done a new complete block that we can repeat again starting on the (4,0) block and then (4,4) and then (0,4). Then we continue like this through the whole array.

As you can see, every time we move, only one bit in one index number is changed. What we have here is actually two halves of a gray code that is counting up, half of it in the x coordinate and the other half in y. This means that we can iterate through the whole array in the described way by making only one loop that goes up to the square of the size and then we get the two indexes (x and y) by calculating the gray code of i and then de-interleaving the result so that bits 0,2,4... make the x coordinate and bits 1,3,5... make y. Then we compare indexes x and y:

    for (int i = 0; i < size * size; i++) { get_gray_xy(i, &x, &y); do_something(arr[x], arr[y]); }

The result of all this was that the number of cache line/page switches went down by orders of magnitude but the algorithm didn't get faster. This is probably for two reasons: incremental linear memory access is very fast on modern hardware and calculating the gray code and all that added an overhead.

phkahler · on July 22, 2022

While reading your post I kept thinking "de-interleave the bits of a single counter" since I have used that before to great benefit. It does suffer from issues if your sizes are not powers of two, so your grey code modification seems interesting to me. I'll be looking in to that.

FYI this also extends to higher dimensions. I once used it to split a single loop variable into 3 for a "normal" matrix multiplication to get a huge speedup. Unrolling the inner 3 bits is possible too but a bit painful.

Also used it in ray tracing to scan pixels in Z-order, which gave about 10 percent performance improvement at the time.

But yes, I must look at this grey code step to see if it makes sense to me and understand why its not helping you even with reduced cache misses.

robalni · on July 23, 2022

Interestingly I get more cache misses with the gray code solution but still fewer switches between cache lines. This has to be because the cache line prefetcher can predict the memory usage in the simple version but it's harder to predict in the gray code or de-interleave version.

Also, I have never thought about de-interleaving bits without using gray code but that seems to work similarly. The main reason I used gray code is that then we don't have to switch memory blocks as often since only one of the indexes changes every time so one of the current memory blocks can always stay the same as in the previous iteration.

samsquire · on July 22, 2022

This is similar to what I'm working on.

I am working on machine sympathetic systems. One of my ideas is concurrent loops.

Nested loops are equivalent to the Nth product of the loop × loop × loop or the Cartesian product.

  for letter in letters:
   for number in numbers:
    for symbol in symbols:
     print(letter + number + symbol)

If len(letters) == 3, len(numbers) == 3, len(symbols) == 3. If you think of this as loop indexes "i", "j" and "k" and they begin at 000 and go up in kind of base 3 001, 001, 002, 010, 011, 012, 020, 100 and so on.

The formula for "i", "j" and "k" is

  N = self.N
  for index, item in enumerate(self.sets):
    N, r = divmod(N, len(item))
    combo.append(r)

So the self.N is the product number of the nested loops, you can increment this by 1 each time to get each product of the nested loops.

We can load balance the loops and schedule the N to not starve the outer loops. We can independently prioritize each inner loop. If you represent your GUI or backend as extremely nested loops, you can write easy to understand code with one abstraction, by acting as if loops were independent processes.

So rather than that sequence, we can generate 000, 111, 222, 010, 230, 013 and so on.

Load balanced loops means you can progress on multiple items simultaneously, concurrently. I combined concurrent loops with parallel loops with multithreading. I plan it to be a pattern to create incredibly reactive frontends and backends with low latency to processing. Many frontends lag when encrypting, compressing or resizing images, it doesn't need to be that slow. IntelliJ doesn't need to be slow with concurrent loops and multithreading.

See https://github.com/samsquire/multiversion-concurrency-contro...

Loops are rarely first class citizens in most languages I have used and I plan to change that.

I think I can combine your inspiration of gray codes with this idea.

I think memory layout and data structure and algorithm can be 3 separate decisions. I am yet to see any developers talk of this. Most of the time the CPU is idling waiting for memory, disk or network. We can arrange and iterate data to be efficient.

samsquire · on July 22, 2022

Link to my concurrent loop repository.

https://github.com/samsquire/multiversion-concurrency-contro...

samsquire · on July 22, 2022

I decided to try implement get_gray_xy. I am wondering how you can generalise it to any number of loops and apply it to nested loops of varying sizes, that is if you have three sets and the sizes are 8, 12, 81.

Wikipedia says graycodes can be found by num ^ (num >> 1)

  a = [1, 2, 3, 4]
  b = [2, 4, 8, 16]
  indexes = set()
  correct = set()

  print("graycode loop indexes")
  for index in range(0, len(a) * len(b)):
    code = index ^ (index >> 1)
    left = code & 0x0003
    right = code >> 0x0002 & 0x0003
  
    print(left, right)
    indexes.add((left, right))
  
  assert len(indexes) == 16
  
  print("regular nested loop indexes")
  
  for x in range(0, len(a)):
    for y in range(0, len(b)):
      correct.add((x,y))
      print(x, y)
  
  
  assert correct == indexes

This produces the sequence:

    graycode loop indexes
    0 0
    1 0
    3 0
    2 0
    2 1
    3 1
    1 1
    0 1
    0 3
    1 3
    3 3
    2 3
    2 2
    3 2
    1 2
    0 2
    regular nested loop indexes
    0 0
    0 1
    0 2
    0 3
    1 0
    1 1
    1 2
    1 3
    2 0
    2 1
    2 2
    2 3
    3 0
    3 1
    3 2
    3 3

robalni · on July 22, 2022

Here is the implementation of get_gray_xy that I used:

    static uint64_t deinterleave(uint64_t x) {
        x = x & 0x5555555555555555;
        x = (x | (x >>  1)) & 0x3333333333333333;
        x = (x | (x >>  2)) & 0x0f0f0f0f0f0f0f0f;
        x = (x | (x >>  4)) & 0x00ff00ff00ff00ff;
        x = (x | (x >>  8)) & 0x0000ffff0000ffff;
        x = (x | (x >> 16)) & 0x00000000ffffffff;
        return x;
    }
    static void get_gray_xy(uint64_t n, uint64_t *x, uint64_t *y) {
        uint64_t gray = n ^ (n >> 1);
        *x = deinterleave(gray);
        *y = deinterleave(gray >> 1);
    }

I don't think the gray code solution can work well for sets of different sizes because the area of indexes that you go through grows as a square.

But it does work for different number of dimensions or different number of nested loops. For 3 dimensions each coordinate is constructed from every 3rd bit instead of 2nd.

So if you have index 14, then gray code is 9 (1001) which means in two dimensions x=1,y=2 and in three dimensions, x=3,y=0,z=0.

samsquire · on July 22, 2022

Thank you for sharing your idea and your code and thoughts and thanks for your reply.

I would like to generalise it for sets of different sizes. Maybe it's something different to a gray codes which you taught me today, it feels that it should be simple.

Maybe someone kind can step in and help me or I can ask it on Stackoverflow.

I need to think it through.

At one point in time I was reading a section on Intel's website of cache optimisation using a tool that shows you the cache behaviour of the CPU and column or row major iteration it had a GUI. I found it very interesting but I cannot seem to find it at this time. I did find some medium article of Cachediff.

phkahler · on July 22, 2022

deinterleave doesn't actually produce gray code does it? The GP said to convert to gray code before doing the deinterleave.

jve · on July 22, 2022

> but the algorithm didn't get faster

Could it be branch predictor at play? https://stackoverflow.com/questions/11227809/why-is-processi...

chakkepolja · on July 23, 2022

http://queue.acm.org/detail.cfm?id=1563874

http://deliveryimages.acm.org/10.1145/1570000/1563874/jacobs...

jtolmar · on July 22, 2022

The union-find data structure / algorithm is useful and a lot of fun.

The goal is a data structure where you can perform operations like "a and b are in the same set", "b and c are in the same set" and then get answers to questions like "are a and c in the same set?" (yes, in this example.)

The implementation starts out pretty obvious - a tree where every element either points at itself or some thing it was merged with. To check if two elements are in the same set, check if they have the same parent. Without analyzing it, it sounds like you'll average findRoot() performance of O(log(n)), worst-case O(n).

There are a couple of simple optimizations you can do to this structure, the type of things that seem like they shouldn't end up affecting asymptotic runtime all that much. The first is that, whenever you find a root, you can re-parent all the nodes you visited on the way to that root, so they'll all be quicker to look up next time. The other is that you keep track of the size of sets, and always make the larger set be the parent of the smaller.

And neither of those actually do anything impressive alone, but if you use both, the algorithm suddenly becomes incredibly fast, with the slowest-growing (non-constant) complexity I've ever heard of: O(the inverse of the Ackermann function(n)). Or, for any reasonable N, O(4 or less).

contificate · on July 22, 2022

Agreed, union-find is great.

My favourite usage of it in practice is for solving first-order (syntactic) unification problems. Destructive unification by means of rewriting unbound variables to be links to other elements is a very pretty solution - especially when you consider the earlier solutions such as that of Robinson's unification which, when applied, often involves somewhat error-prone compositions of substitutions (and is much slower).

The fact that the forest of disjoint sets can be encoded in the program values you're manipulating is great (as opposed to, say, the array-based encoding often taught first): makes it very natural to union two elements, chase up the tree to the set representative, and only requires minor adjustment(s) to the representation of program values.

Destructive unification by means of union-find for solving equality constraints (by rewriting unbound variables into links and, thus, implicitly rewriting terms - without explicit substitution) makes for very easy little unification algorithms that require minimal extension to the datatypes you intend to use and removing evidence of the union-find artefacts can be achieved in the most natural way: replace each link with its representative by using find(l) (a process known as "zonking" in the context of type inference algorithms).

Basic demonstration of what I'm talking about (the incremental refinement of the inferred types is just basic union-find usage): https://streamable.com/1jyzg8

divan · on July 22, 2022

I remember Ravelin shared their story of using it in card fraud detection code.

Basically they managed to replace a full-fledged graph database with small piece of code using union-find in Go. Was a great talk.

https://skillsmatter.com/skillscasts/8355-london-go-usergrou...

kragen · on July 22, 2022

My writeup of union find is in https://dercuano.github.io/notes/incremental-union-find.html. I did not in fact find a way to make it efficiently support incremental edge deletion, which is what I was looking for.

thaumasiotes · on July 22, 2022

> I did not in fact find a way to make it efficiently support incremental edge deletion, which is what I was looking for.

I don't understand this goal. The interior connections aren't relevant to a union-find structure; ideally you have a bunch of trees of depth 2 (assuming the root is at depth 1), but the internal structure could be anything. That fact immediately means that the consequence of removing an edge is not well-defined - it would separate the two objects joined by the edge, and it would also randomly divide the original connected set into two connected sets, each containing one of those two objects. Which object each "third party" object ended up being associated with would be an artifact of the interior structure of the union-find, which is not known and is constantly subject to change as you use the union-find.

If all you want is to be able to remove a single object from a connected set, on the assumption that your union-find always has the ideal flat structure, that's very easy to do - call find on the object you want to remove, which will link it directly to the root of its structure, and then erase that link.

kragen · on July 22, 2022

Union find takes as input some undirected graph <N, E> and internally constructs (and progressively mutates) a directed graph <N, E'> which it uses to efficiently answer queries about whether two nodes n₁, n₂ ∈ N are in the same connected component of <N, E>. It additionally supports incrementally adding edges to E.

My quest was to find a way to incrementally delete edges from E, not E'. You're talking about deleting edges from E', which I agree is not generally a useful thing to do.

For example, your N might be the cells of a maze map, and your E might be the connections between adjacent cells that are not separated by a wall. In that case you can tear down a wall and add a corresponding edge to E. But it would be nice in some cases to rebuild a wall, which might separate two previously connected parts of the maze. I was looking for an efficient way to handle that, but I didn't find one.

jtolmar · on July 22, 2022

I don't think there's a way to extend union-find to do what you want in the general case. You might have more luck starting with a minimum cut algorithm.

For the specific case where you can identify some of your edges are more or less likely to be deleted, though, you can run union-find on only the stable edges, cache that result, and then do the unstable edges. Whenever an unstable edge is deleted, reload the cache and redo all the unstable edges. This works for something like a maze with open/closable doors.

kragen · on July 22, 2022

Those are some interesting ideas, thanks! I hadn't thought about trying to apply minimum-cut to the problem!

thaumasiotes · on July 22, 2022

> Union find takes as input some undirected graph <N, E> and internally constructs (and progressively mutates) a directed graph <N, E'> which it uses to efficiently answer queries about whether two nodes n₁, n₂ ∈ N are in the same connected component of <N, E>.

I don't think this is a valuable way to think about the structure. That's not what it's for.

A union-find is a direct representation of the mathematical concept of an equivalence relation. The input is a bunch of statements that two things are equal. It will then let you query whether any two things are or aren't equal to each other.

This is captured well by the more formalized name "disjoint-set structure". You have a set of objects. You can easily remove an element from the set. But it makes no conceptual sense to try to "separate two of the elements in a set". They're not connected, other than conceptually through the fact that they are both members of the same set.

A union-find is a good tool for answering whether two vertices are in the same connected component of a graph because being in the same connected component of a graph is an equivalence relation, not because the union-find is a graph-related concept. It isn't, and there is no coherent way to apply graph-theoretical concepts to it.

Putting things another way:

You describe a union-find as something used to "efficiently answer queries about whether two nodes n₁, n₂ ∈ N are in the same connected component of [a graph]".

But you focused on the wrong part of that statement. What makes a union-find appropriate for that problem is the words "the same", not the words "connected component".

kragen · on July 22, 2022

You are welcome to think about the union-find structure however you prefer to think about it, but I was describing the problem I was trying to solve, for which the correct description of union find I gave above is optimal. Contrary to your mistaken assertion, no contradictions arise from applying graph-theoretical concepts in this way; there is no problem of "coherency". It's just a form of description you aren't accustomed to.

If your way of thinking about union find makes it hard for you to understand the problem I was trying to solve, maybe it isn't the best way to think about it for the purpose of this conversation, even if it is the best way to think about it in some other context.

I'm not claiming my description is the only correct description of union find, just that it's the one that most closely maps to the problem I was trying to solve.

The description can equally well be formulated in the relational terms you prefer: given a relation R, union find can efficiently tell you whether, for any given x and y, (x, y) ∈ R', where R' is the symmetric transitive reflexive closure of R (and is thus the smallest equivalence relation containing R). It efficiently supports incrementally extending R by adding new pairs (a, b) to it.

This is equivalent to my graph-theoretic explanation above except that it omits any mention of E'. However, it is longer, and the relationship to the maze-construction application is less clear. Perhaps you nevertheless find the description clearer.

What I was looking for, in these terms, is a variant of union find that also efficiently supports incrementally removing pairs from R.

Does that help you understand the problem better?

— ⁂ —

In general there is a very close correspondence between binary relations and digraphs, so it's usually easy to reformulate a statement about relations as an equivalent statement about digraphs, and vice versa. But one formulation or the other may be more perspicacious.

More generally still, as you move beyond the most elementary mathematics, you will learn that it's usually a mistake to treat one axiom system or vocabulary as more fundamental than another equivalent axiom system, since you can derive either of them from the other one. Instead of arguing about which one is the right axiom system or vocabulary, it's more productive to learn to see things from both viewpoints, flexibly, because things that are obvious from one viewpoint are sometimes subtle from the other one.

thaumasiotes · on July 23, 2022

> You are welcome to think about the union-find structure however you prefer to think about it, but I was describing the problem I was trying to solve, for which the correct description of union find I gave above is optimal.

> If your way of thinking about union find makes it hard for you to understand the problem I was trying to solve, maybe it isn't the best way to think about it for the purpose of this conversation, even if it is the best way to think about it in some other context.

I'm not having any problems understanding the problem you were trying to solve. My problems are in the area of understanding why you thought a union-find might be an effective way to address that problem.

I'm telling you that, if you adjust your view of what a union-find is, you will have a better conception of where it can and can't be usefully applied.

> In general there is a very close correspondence between binary relations and digraphs, so it's usually easy to reformulate a statement about relations as an equivalent statement about digraphs, and vice versa. But one formulation or the other may be more perspicacious.

In this case, the relation is symmetric by definition, so you just have graphs. Yes, it's obvious how you can use a graph to describe a relation.

But the entire point of the union-find structure is to discard any information contained in an equivalent graph that isn't contained in the relation. (And there are many equivalent graphs, but only one relation!) The absence of that information is what makes the structure valuable. It shouldn't be a surprise that, if you want to preserve information from a particular graph, you are better served with graph-based algorithms.

kragen · on July 23, 2022

Well, that's an interesting way to look at it; I see better where you're coming from now.

But I wasn't trying to get the unmodified data structure to handle deletion efficiently; I was trying to use it as a basis to design a structure that could. I thought I saw a minor modification that achieved this, but then I found out why it doesn't work. That's what my linked note above is about.

More generally, if you only try things that will probably work, you'll never achieve anything interesting.

MichaelMoser123 · on July 22, 2022

Can you give some examples where union-find is applied to great benefit?

fumblebee · on July 22, 2022

A while back I had to perform an entity resolution task on several million entities.

We arrived at a point in the algorithm where we had a long list of connected components by ids that needed to be reduced into a mutually independent set of components, e.g. ({112, 531, 25}, {25, 238, 39, 901}, {43, 111}, ...)

After much head banging about working out way to do this that wouldn't lead to an out-of-memory error, we found the Disjoint Set Forest Data Structure and our prayers were answered.

I was very grateful for it!

BeetleB · on July 22, 2022

Exactly the same for me. We had a defect management system, and a defect can be connected to others as dupes. We'd have whole chains of dupes. We used union find to identify them.

superjan · on July 22, 2022

One example is connected component analyis on images. In one linear scan of an image (2d or 3d) you can find all connected blobs in an image.

jobigoud · on July 22, 2022

I used it to find and track islands of connected polygons in 3D meshes.

clashmeifyoucan · on July 23, 2022

a textbook example of an algorithm that works very well with union-find is the kruskal's algorithm to find the minimum weight spanning tree given a graph. Using union-find improves the time complexity of the algorithm from O(V²) to O(ElogV).

This happens because kruskal's algorithm essentially selects the cheapest edge not already included in our spanning tree that won't cause a cycle. So union-find is able to speed up this potential cycle check which would otherwise be naively quadratic.

zimpenfish · on July 22, 2022

Oh, huh, this is serendipitously handy because I need to be doing something like that this weekend (translating from Go to Rust - I already have a reparenting solution using hashes but hopefully this is easier / better.)

hexomancer · on July 22, 2022

I have always wanted to really understand this data structure. Sure, I can follow the analysis with the potential functions and all, but I never really understood how Tarjan came up with the functions in the first place. Does anybody have a resource which intuitively explains the analysis?

hashingroll · on July 22, 2022

This might a long read but it is well-written reader-friendly analysis of Tarjan's proof with chain of reasoning. https://codeforces.com/blog/entry/98275

sn9 · on July 22, 2022

Robert Sedgewick has a great explanation in his Algorithms book. He has gorgeous diagrams alongside his explanations.

hexomancer · on July 22, 2022

Is it in an old edition? I just downloaded an extremely legitimate version of Algorithms 4th edition but it has no section on union data structure.

sn9 · on July 22, 2022

It should be in Chapter 1 Section 5 of the 4th edition. It's even in the table of contents.

hexomancer · on July 23, 2022

Sorry, my bad. I searched for the word "disjoint" expecting it to be prominent in this section but somehow this word only appears 3 times in this book non of which are in this section!

Anyway, while the figures in this work were indeed gorgeous, it was not what I was looking for. It did not even contain a non-intuitive analysis of the inverse ackermann complexity, much less an intuitive one!

simonlindholm · on July 22, 2022

Here's a writeup I made a while ago: https://www.overleaf.com/read/hjbshsqmjwpg

baby · on July 22, 2022

We use it to wire different outputs and inputs to gates in a zero-knowledge proof circuit!

endgame · on July 22, 2022

I have heard stories about annotating the edges with elements from a group rather than using unlabelled edges. Have you heard anything about that?

eutectic · on July 22, 2022

You can efficiently maintain count/max/sum/hash/etc. for each component. It's occasionally useful.

rag-hav · on July 22, 2022

I wrote a beginner's guide on this topic a while ago https://medium.com/nybles/efficient-operations-on-disjoint-s...

IanCal · on July 22, 2022

This sounds super useful thanks! I think I'll have something this can be used for, with individual claims of whether two elements are in the same set, then easily getting the largest set of equal items.

kizer · on July 22, 2022

If at some point you have a cycle, you can then reparent and remove the cycle, right? This structure in general then can encode transitive relations effectively?

Something like “a and b …” “b and c …” “c and a …”.

thaumasiotes · on July 22, 2022

> If at some point you have a cycle, you can then reparent and remove the cycle, right?

You'd never have a cycle. The way a cycle would theoretically arise would have to be joining something to its own child. But you don't naively join the two objects you're given - you find their root objects, and join those. If - as would be necessary for a cycle to form - they have the same root, you can return without doing anything.

If we call

    unite(a,b)
    unite(b,c)
    unite(c,a)

then we should end up with this structure:

        c
       / \
      b   a

With parents on the right, the structure will be a-b after the first call and a-b-c after the second call. The reason we end up with a shallower tree after the third call is that when a is passed to unite, we call find on it and it gets reparented directly to c. Then, c (the representative for c) and c (the representative for a) are already related, so we don't adjust the structure further.

> This structure in general then can encode transitive relations effectively?

Well... no. This structure embodies the concept of an equivalence relation. You wouldn't be able to use it to express a general transitive relation, only a transitive relation that is also symmetric and reflexive.

kizer · on July 24, 2022

Thanks!

JamesUtah07 · on July 22, 2022

Is there a name for the data structure/algorithm?

whoisburbansky · on July 22, 2022

It's called union-find, but also disjoint-set sometimes. The process of reparenting nodes is called path compression, and the optimization of using the larger set of the two as the parent when merging is usually just called "union by rank." Any of those terms should give you more details upon search.

lenocinor · on July 22, 2022

https://en.m.wikipedia.org/wiki/Disjoint-set_data_structure

anewpersonality · on July 22, 2022

I remember getting this in an interview without ever coming across it.. basically, fuck union find.

mentater · on July 22, 2022

Wouldn't that be extremely useful and a polynomial solution for 3-SAT, which is NP complete?

jcranmer · on July 22, 2022

I think I understand why you're confused here. To the extent that union-find is similar to SAT, it's similar to 2-SAT: the constraints you're reasoning about are defined only on pairs of elements, whereas 3-SAT is fundamentally reasoning about triplets (at least one of these three variables is true). Notably, 2-SAT is a variant of SAT that is polynomial in time, unlike 3-SAT.

HelloNurse · on July 22, 2022

SAT complexity results from blowing up a reasonable number of variables/clauses (very rarely equivalent to others) in the formula to an enormous number of variable assignments (too many to store, and having only three classes, formula true or false or not evaluated so far, that are not useful equivalencies). What disjoint sets are you thinking of tracking?

xigoi · on July 22, 2022

How is it a solution to 3-SAT?

loxias · on July 22, 2022

(Fantastic post idea OP. One of the best I've ever seen :D)

Related to bloom filters, xor filters are faster and more memory efficient, but immutable.

HyperLogLog is an efficient way to estimate cardinality.

Coolest thing I've learned recently was Y-fast trie. If your dataset M is bounded integers (say, the set of all 128 bit numbers), you get membership, predecessor, or successor queries in log log time, not log, like in a normal tree.

see: https://www.youtube.com/playlist?list=PLUl4u3cNGP61hsJNdULdu... (6.851 Advanced Data Structures, Erik Demaine)

Would love to learn more "stupidly fast at the cost of conceptual complexity" things.

edit:

(adding) I don't know a name for it, because it's not one thing but a combination, but once can:

Use the first N bits from a very quick hash of a key from an unknown distribution (say a file path, or a variable name, or an address, or a binary blob,..) as a way to "shard" this key across M other fast data structures (like a tree) for search/add/remove. By changing M, you can tune the size of the terms in the O(1)+O(log) equation for running time.

Trees getting too deep for fast search? Every increase of N moves the computation from the log search of the tree to the space tradeoff of having more trees.

Added benefit is you can scale to multiple threads easily. Instead of locking the whole tree, you lock a tiny sub-tree.

Very clever. (I first saw in the Aerospike key value store)

lorenzhs · on July 22, 2022

If you enjoyed XOR filters, you might also like ribbon filters, something that I had the pleasure of working on last year. They share the basic idea of using a system of linear equations, but instead of considering 3 random positions per key, the positions to probe are narrowly concentrated along a ribbon with a typical width of 64. This makes them far more cache-efficient to construct and query.

By purposefully overloading the data structure by a few per cent and bumping those items that cannot be inserted as a result of this overloading to the next layer (making this a recursive data structure), we can achieve almost arbitrarily small space overheads: <1% is no problem for the fast configurations, and <0.1% overhead with around 50% extra runtime cost. This compares to around 10% for XOR filters and ≥ 44% for Bloom filters.

In fact, I'm going to present them at a conference on Monday - the paper is already out: https://drops.dagstuhl.de/opus/volltexte/2022/16538/pdf/LIPI... and the implementation is at https://github.com/lorenzhs/BuRR/. I hope this isn't too much self-promotion for HN, but I'm super hyped about this :)

loxias · on July 23, 2022

This made my day, thank you so much. :) Your explanation makes sense and it sounds brilliant/clever yet obvious. (like many good ideas are)

I'm reading the paper and looking at your github now, and look forward to "github/academia" stalking you in the future. Skimming your list of repositories and seeing a lot of stuff I understand and could possibly use. ;-)

(I find it to be a useful strategy to, when I find a clever concept in a paper, or in code on github, then look at all the other things done by the same person, or the works of people they co-author with. "collaborative filtering" for ideas.)