A group of ex-NSA and Amazon engineers are building a ‘GitHub for data’

      Comments Off on A group of ex-NSA and Amazon engineers are building a ‘GitHub for data’




Six months ago or thereabouts, a group of engineers and developers with backgrounds from the National Security Agency, Google and Amazon Web Services had an idea.

Data is valuable for helping developers and engineers to build new features and better innovate. But that data is often highly sensitive and out of reach, kept under lock and key by red tape and compliance, which can take weeks to get approval. So, the engineers started Gretel, an early-stage startup that aims to help developers safely share and collaborate with sensitive data in real time.

It’s not as niche of a problem as you might think, said Alex Watson, one of the co-founders. Developers can face this problem at any company, he said. Often, developers don’t need full access to a bank of user data — they just need a portion or a sample to work with. In many cases, developers could suffice with data that looks like real user data.

“It starts with making data safe to share,” Watson said. “There’s all these really cool use cases that people have been able to do with data.” He said companies like GitHub, a widely used source code sharing platform, helped to make source code accessible and collaboration easy. “But there’s no GitHub equivalent for data,” he said.

And that’s how Watson and his co-founders, John Myers, Ali Golshan and Laszlo Bock came up with Gretel.

“We’re building right now software that enables developers to automatically check out an anonymized version of the data set,” said Watson. This so-called “synthetic data” is essentially artificial data that looks and works just like regular sensitive user data. Gretel uses machine learning to categorize the data — like names, addresses and other customer identifiers — and classify as many labels to the data as possible. Once that data is labeled, it can be applied access policies. Then, the platform applies differential privacy — a technique used to anonymize vast amounts of data — so that it’s no longer tied to customer information. “It’s an entirely fake data set that was generated by machine learning,” said Watson.

It’s a pitch that’s already gathering attention. The startup has raised $3.5 million in seed funding to get the platform off the ground, led by Greylock Partners, and with participation from Moonshots Capital, Village Global and several angel investors.

“At Google, we had to build our own tools to enable our developers to safely access data, because the tools that we needed didn’t exist,” said Sridhar Ramaswamy, a former Google executive, and now a partner at Greylock.

Gretel said it will charge customers based on consumption — a similar structure to how Amazon prices access to its cloud computing services.

“Right now, it’s very heads-down and building,” said Watson. The startup plans to ramp up its engagement with the developer community in the coming weeks, with an eye on making Gretel available in the next six months, he said.