Getting Started with S3 Source Configuration

Requirements:

  • Active AWS account with S3 access.
  • Appropriate IAM permissions to access the desired S3 buckets.
  • Choose the Databrain Workspace to which you wish to connect the data.

Setup Guide:

  1. Ensure Bucket Accessibility:
    • Make sure your S3 bucket is active and accessible from Databrain.
    • This depends on your AWS account settings and bucket permissions.
  2. Grant Necessary Permissions:
    • Read Access on Buckets and Objects: Grant read access permissions to the S3 buckets and objects you want to sync.
  3. Fill Up Connection Info:
    • Provide the following information to connect to your S3 bucket:
      • Destination Name: A custom name to identify this connection in Databrain.
      • S3 Region: The AWS region where your S3 bucket is located (e.g., us-east-1).
      • S3 Access Key ID: Your AWS Access Key ID for authentication.
      • S3 Secret Access Key: Your AWS Secret Access Key associated with the Access Key ID.
      • S3 Bucket Dataset Folder Path: The specific folder path within your bucket (e.g., awss3_folder_test_less/).
      • S3 Bucket Name: The name of your S3 bucket (e.g., databrain-s3-test-csv).
      • Table Level: Select whether to interpret data at the Folder or File level.

Permissions:

  • Permission to list bucket contents.
  • Permission to read objects from the specified bucket.
  • If using KMS encryption, permission to use the KMS key for decryption.

Locating the Configuration Details in AWS S3

  1. Destination Name:
    • Choose a descriptive name for this connection within Databrain.
  2. S3 Region:
    • Log in to the AWS Management Console and open the S3 service.
    • Select your bucket, and find the region information in the bucket’s “Properties” tab.
  3. S3 Access Key ID & Secret Access Key:
    • Generated in the IAM (Identity and Access Management) section of AWS.
    • Navigate to IAM, select the desired user, go to the “Security credentials” tab, and create or manage access keys.
  4. S3 Bucket Dataset Folder Path:
    • Navigate to your bucket in the S3 console and note the specific folder path you wish to sync.
  5. S3 Bucket Name:
    • This is the name of your S3 bucket, visible in the S3 dashboard of the AWS Management Console.
  6. Table Level:
    • Determine whether your data should be interpreted at the folder level or file level based on your S3 bucket structure and data organization.