A Story About Vulnerability Analysis and Early Detection


It is a collaborative put up between Databricks and Orca Safety. We thank Yanir Tsarimi, Cloud Safety Researcher, of Orca Safety for his or her contribution.

Databricks’ primary precedence is the safeguarding of our buyer information. As a part of our defense-in-depth method, we work with safety researchers locally to proactively uncover and remediate potential vulnerabilities within the Databricks platform in order that they could be mounted earlier than they grow to be a threat to our prospects. We do that by way of our personal bug bounty and third-party penetration testing contracts.

We all know that safety can also be top-of-mind for purchasers throughout the globe. So, to showcase our efforts we’d prefer to share a joint weblog on a current expertise working with considered one of our safety companions, Orca Safety.

Orca Safety, as a part of an ongoing analysis effort, found a vulnerability within the Databricks platform. What follows under is Orca’s description of their course of to find the vulnerability, Databricks’ detection of Orca’s actions, and vulnerability response.

The vulnerability
Within the Databricks workspace, a consumer can add information and clone Git repositories to work with them inside a Databricks workspace. These recordsdata are saved inside Databricks-managed cloud storage (e.g., AWS S3 object storage), in what Databricks refers to as a “file retailer.” Orca Safety’s analysis was targeted on Databricks options that work with these uploaded recordsdata – particularly Git repository actions. One particular function had a safety subject: the power to add recordsdata to Git repositories.

The add is carried out in two completely different HTTP requests:

  1. A consumer’s file is uploaded to the server. The server returns a UUID file identify.
  2. The add is “finalized” by submitting the UUID file identify(s).

This process is sensible for importing a number of recordsdata. Trying on the request despatched to the server when confirming the file add, the researcher observed that the HTTP request is distributed with three parameters:

"path": "the trail from step one",
"identify": "file identify to create within the git repo",
"storedInFileStore": false

The final parameter caught the Orca researcher’s eye. They already knew the “file retailer” is definitely the cloud supplier’s object storage, so it was not clear what “false” meant.

The researcher fiddled with the add requests and decided that when the file add is “confirmed” after the primary HTTP request, the uploaded file was saved domestically on the disk below “/tmp/import_xxx/…”. The import non permanent listing will get prepended to the uploaded file identify. The Orca researcher wanted to find out whether or not they might execute a listing traversal assault, this concerned sending a request for an area file identify similar to “…/…/…/…/and so forth./subject” and seeing if it really works. It didn’t. The backend checked for traversals and didn’t permit the Orca researcher to finish the add.

Whereas Databricks had prevented this assault, additional makes an attempt confirmed that whereas trying to make the most of relative paths to traverse didn’t work, the system did have a vulnerability, because it permitted the researcher to supply an absolute path similar to “/and so forth./subject.” After trying to add this file, the researcher verified the file contents through the Databricks internet console, a sign that they may be capable of learn arbitrary recordsdata from the server’s filesystem.

To know the severity of this subject with out doubtlessly compromising buyer information, the Orca Safety researcher fastidiously tried studying recordsdata below “/proc/self`. The researcher decided that they might be capable of acquire sure data whereas studying surroundings variables from “/proc/self/environ.” They ran a script iterating in opposition to “/proc/self/fd/XX,” which yielded learn entry to open log recordsdata. To make sure that no information was compromised, they paused the assault to alert Databricks of the findings.

Databricks’ detection and vulnerability response
Previous to notification, Databricks had already quickly detected the anomalous habits and started to research, comprise and take countermeasures to repel additional assaults — and contacted Orca Safety even earlier than Orca was capable of report the difficulty to Databricks.

The Databricks group, as a part of the Incident Response procedures, was capable of determine the assault and vulnerability and deploy a set model inside just some hours. Databricks additionally decided that the uncovered environmental data was not legitimate within the system on the time of the analysis. Databricks even recognized the supply of the requests and labored diligently with Orca Safety to validate their detection and actions and additional defend prospects.

Orca Safety wish to applaud Databricks’ safety group efforts, as to today, that is the one time we’ve been detected whereas researching a system.