Contents


TLDR

Is this really important?

Why Today?

How will Oxen expand into a truly standalone platform?

As a data storage layer, how do you fit in with the snowflakes of the world?

Who Are We?

Comparisons to Other Tools

Is this similar to git-lfs?

Nice, love to see more Rust in the ML space! How does this compare to DVC?

How does this compare to Dolt?

Functionality Questions

What does oxen do with ill formed data? At work, we have a product that produces files that are nominally CSV, but which is in reality just an event log where each row might have a different set of columns.

Does it handle non-textual structured data formats? HDF, SQLite, or the like?

How well does it handle time series/sequence data? Lets say I want to load N number of sequences of T length and randomly shuffle them for training. How would this perform if the dataset was hosted on a s3 compatible store? I noticed your DF stuff is backed by polars, any reason why you went with polars vs datafusion?

Why does Oxen use XXH3?