<aside> 💡 Hey fellow devs! I'm Kyohei, cofounder of a privacy tech startup called Hyde.

I made a Form Builder like Typeform where nobody can see your answers in clear text.

For the context, I have a pet peeve for OAuth that I don't want represent my authorization by giving away my data. Once you authorize on OAuth you have no way to be sure them not in unintended use nor passed to another party. “Revoking” in terms of OAuth doesn't mean anything to data already given away.

And I have a solution for this: ****not providing data!

There's a strong assumption that data would have to be passed away to be processed. This had been traditionally valid for all the computation cannot happen anywhere else than "their server" where programs are deployed.

Our idea is to reverse this. You don't transmit and expose your data to them. They disclose their computation to you, by submitting a Docker image to us.

I made a proof of concept for this. You can create a form just like Google Forms or Typeform. But you wouldn't be able to read answers from respondents. To access the data, you need to have a Docker image. The image will be executed on our end, with JSON files of responses mounted there. It would be supposed to write results to a specific file, from which we would read and reture via API.

Any feedback would be warmly welcome.

</aside>

<aside> 📩 Contact me with @tnzk on Twitter or @tnzk:matrix.org.

</aside>

Get Started with Example Docker image

Sign up Hyde
- This also requires signup Parcel which we rely on for condidential computing.
Go to https://hyde.to/c/forms (currently not linked from anywhere in the site)
Create a form and answer some. Note that it takes <10 minutes an answer to be embedded and available
Try via Postman.
- https://www.postman.com/hydehq/workspace/form-builder-like-typeform-but-nobody-can-see-your-answers-in-clear-text
- Use POST /api/0.3/forms/[slug]/analysis to submit Docker image to be run. In the Postman, the image is prepopulated with tnzk/hyde-form-example . You can bring your own image. It returns a jobId to query the status of the container.
- Use GET /api/0.3/forms/[slug]/analysis/[executionId] to check the status of the container deployed. You can poll it with an interval it returns as waitInSec field (please, please respect interval as this technical PoC does so naive load handling that you can kill us easily). Once after the container has completed, it returns word counts in total as the result.
Optional: We’re planning to provide a way to have a quick grasp on the responses in aggregated, non-personally identifiable way, but it’s still work in progress. See embedding explorer. Here, you can explore the responses in terms of the distribution of their embedding, by the segment you chose out of the all non-anonymised fields. We’re planning to provide another API to obtain these embeddings programatically, but not for now.

Write Custom Docker image

So far we’ve examined how we can run the example Docker image to count words in all the responses. You can build a custom Docker images to implement more interesting analysis, such as words frequency, sentiment analysis or the like.

You can write Docker images as usual. The example we’ve seen is implemented like:

import pandas as pd
import hyde

def stringify(v):
  type = v['type']
  f = {
    'text': lambda: v['value'],
    'number': lambda: str(v['value']),
    'choices': lambda: ' '.join(v['value'])
  }[type]
  return f()

try:
  word_count = 0
  response_files = [name for name in hyde.listdir_input() if name.startswith('response-') and name.endswith('.json')]

  for path in response_files:
    json = hyde.read_json(path)
    for answer in json['answers']:
      s = stringify(answer)
      
      word_count += len(s.split(' '))

  result = { 'message': f"Responses have {word_count} words in total" }
  hyde.write_json('result.json', result)

except BaseException as e:
  hyde.write_json('error.json', {'error': repr(e)})

As you can see this is a pretty usual Python script, whose only non-usual part is the hyde module. This provides several useful functionality to work on the survey runtime, while it is optional to use.

The points are:

Responses are mounted as a file in a specific directory. They are mounted on /parcel/data/in. Every response is provided as a JSON file. Here, hyde.listdir_input() provides you a list of files in the input directory, as well as hyde.read_json(path) allows you to read and parse it as JSON conveniently. You can do the same thing with os module.
We collect output from a file in /parcel/data/out/result.json for successful path or ./error.json for failure path. Note that an exception means an error on the application level and its status would be Suceeded whereas the status being Failure implies some errors on the container level.
Any Docker image is okay. It expects CMD to be [] and pass run.py as a default which you can overwrite if you like to.
All the images need to be reviewed by us in advance. Please send me via email, Twitter or Discord. This is cumbersome but without this anyone can read the data and write it as is to the output file and break privacy so easily. At this moment, I will manually check your code (💪😹💪) to ensure this isn’t the case. Ultimately we would automate this by employing something like PrivGuard.

Example code:

https://github.com/MortyDAO/hyde-form-example