SkipLabs: Streaming, Without the Streams

‍We’re delighted to announce Tapestry VC’s investment in Skip’s Seed round alongside Amplify Partners.

Congratulations Julien and the Skip team and thank you for your partnership!

‍

‍

Great investment ideas are often laying in plain sight; Jim Simons dubbed them as “ghosts”. I should clarify — it’s easy to conflate the clandestine or byzantine with “alpha” (it’s better for our egos this way), but these ideas often belong firmly in Charlie’s “too hard” pile. Don’t forget, Nix has been the 6th most-contributed open source project for years, but it took us some time before we found Flox. Bitcoin is another example here of course, as is the Transformer.

What if I could tell you about a technology that’s had, arguably, just as much impact as PyTorch, React, or GraphQL, but a fraction of the buzz? Or that Yann LeCun told us all to pay attention to it back in 2022? Or, more recently, that 86,000+ folk watched Theo celebrate it. Skip hasn’t been particularly hard to find, we all just had to look in the right places. Get your heads out of Harmonic and into Meta Open Source.

Skip is an open-source framework that enables developers to build and run “reactive” services. “Reactive” means as it sounds (admittedly rare in infra); certain applications (e.g. Manna’s drone delivery app) need to react in real-time to new data, or “state”, in order to do their thing. Polymarket is another good example here.

This is typically where folk, correctly, ask me “isn’t this what Kafka does?”. It’s this question that gets me so jazzed, as Skip gives developers real-time capabilities like Kafka, but without the fragility and overhead that comes with streams. Many have tried to assail Confluent’s hegemony (you should sub to Muji) by building a better streaming product; instead Skip asked: is there a better way to do real-time more generally? Skip is streaming, without the streams.

‍

Ok, cool, but how? This won’t be a full Why Now primer (I have about 5 of ‘em that I’m currently cooking up), so we’re going to jump in the deep end here (sorry). Skip programs are declarative. Instead of reasoning about data updates directly, you write code that reasons about a “current” snapshot of state, and Skip will automatically process changes, keeping everything in sync. Kafka, on the other hand, is imperative.

In order to keep everything in sync, Skip tracks “dependencies” (read: inputs) in a “computation graph”. The magic here is that when these inputs change, they’re propagated through to relevant outputs (e.g. our $Vol output in Polymarket), but without reevaluating any computation whose inputs haven’t changed. Perhaps this reminds you of something?

This is a big deal, from both a scalability and performance perspective. Performance-wise, by avoiding unnecessary computation (i.e. re-evaluation of some state that hasn’t changed), Skip’s approach has pretty positive CPU usage implications. This advantage adds up when you think about how large these systems can become.

Next, and I’m prepared for someone to shout at me for this: declarative programming is, fundamentally, more scalable. Why? Well, imperative programming requires you to be explicit about every piece of logic in your control flow, whereas, Skip, figures this logic out for you. This is best illustrated in code. Let’s look at some data processing in Kafka vs. Skip:

// kafka

import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.*;

Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "complex_app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> stream1 = builder.stream("input_topic1");
KStream<String, String> stream2 = builder.stream("input_topic2");

KStream<String, String> joinedStream = stream1.join(
    stream2,
    (value1, value2) -> value1 + "-" + value2,
    JoinWindows.of(Duration.ofSeconds(5))
);

joinedStream.to("output_topic");

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();

‍

Have fun parsing that! & now, for Skip (come at me imperative purists):

// skip

// Define data sources
const dataSource1 = source("data_source1");
const dataSource2 = source("data_source2");

// Define a combined transformation
const combinedData = combineLatest(dataSource1, dataSource2).map(([data1, data2]) => process(data1, data2));

// Use the combined data
show(combinedData);

‍

‍That’s kinda self explanatory, right?

‍

I spent a lot of time thinking about (& was long) $CFLT when ChatGPT first entered the fray. For rather obvious reasons — in the “arms race” that is building foundation models, we want as much novel data as possible, as quickly as possible. Confluent & other streaming products seemed (& tbh, probably still are) well-poised to benefit from this abrupt change. But, this also meant I wrote a lot of Kafka code, and, to be honest, I thought it sucked.

At scale, frameworks are incredibly sticky though. Especially those that help your microservices ~reliably do their thing. Accordingly, there are few people that could challenge such a dominant programming pattern, but it certainly helps when: 1) you’ve been described as “one of the top two or three programming language designers in the world”.

We couldn’t be more excited to see what is a truly elite team flip the entire real-time paradigm on its head. Thank you once again Julien & team for having us at Tapestry VC on this journey with you.