Skip to main content
Version: v3.0 print this page

SageMaker Studios

The Amorphic platform provides integration with AWS SageMaker Studio to accelerate machine learning workflows in SageMaker.

Amazon SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps, from preparing data to building, training, and deploying your ML models. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, and deploy models to production without leaving SageMaker Studio. It allows you to quickly switch environments and collaborate seamlessly within your organization to build ML models at scale.

Utilizing SageMaker Studio through Amorphic enables users to streamline their workflow by alleviating the burden of creating numerous configurations. By leveraging Amorphic, individuals can harness the complete capabilities of AWS SageMaker and Notebooks, facilitating advanced development of machine learning models and pipelines.

Studio Operations

Amorphic Studio provides the below operations.

OperationDescription
Create StudioCreate a studio domain and required resources in AWS SageMaker.
Update StudioUpdate the metadata and resources linked to a studio.
Delete StudioDelete studio components.
Note
  • Default service quotas:
    • Total domains: 2
    • User Profiles: 2
    • Domains with RStudioServerPro Apps: 1
    • Please refer to the service quotas and raise an AWS request to get the quotas updated based on your use cases.
  • If a service quota is exceeded, the studio creation would fail with an error similar to this:
    LimitExceededError: Domain-level App [arn:aws:sagemaker:<region>:<>:app/<>/domain-shared/RStudioServerPro/default] failed to start: [The account-level service limit 'RStudioServerPro Apps running on system instances' is 1 Apps, with current utilization of 1 Apps and a request delta of 1 Apps. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.].
  • Sharing studios with tags is currently not supported.

Create Studio

Create Studio

To create a Studio:

  1. Click on + Create Data Lab.
  2. Users will now have an option to either select/upload a template or create from scratch.
  3. Select the Data Lab Type as Studio.
  4. Fill in the details shown in the table:
AttributeDescription
Data Lab NameGive your studio data lab a unique name.
DescriptionDescribe the studio's purpose and relevant details.
KeywordsAdd relevant keywords to the studio.
Allowed Instances ListSelect the list of ML compute instances with which apps can be created in the studio. By default, the cheapest three instance types will be used.
Volume Size (in GB)Default storage volume size (in GB) for apps created in studio. Value should be between 5 GB and 16000 GB. By default, the storage allocated will be 10 GB.
Max Volume Size (in GB)Max storage volume size (in GB) for apps created in studio. Value should be between 5 GB and 16000 GB. By default, the max storage will be set to 100 GB.
Jupyter Lab Instance TypeSelect the instance type to be used for creating the Jupyter Lab app in the studio. This is defaulted to the first value in the Allowed Instances List if not selected.
RStudio AccessSelect whether to enable/disable access for the RStudio App in the studio. By default, this option will be disabled.
Internet AccessSets whether SageMaker provides internet access to the studio. By default, this option will be disabled.
Shared Resources AccessSelect the shared resources (parameters, shared libraries, domains, etc.) required for the studio using this option.
Datasets AccessSelect datasets with read/write access required for the studio.
Note
  • Studio creation involves provisioning of multiple underlying resources and can take around 5-10 minutes to reach InService status.
  • Read access to datasets with Lakeformation as target location cannot be provided to studio.
  • View type of datasets can be attached only under the Datasets Read Access section.

Studio Details

When a new studio is created, Amorphic creates an AWS sagemaker domain and underlying resources (user-profiles, spaces and apps) for consumption.

Users can launch the Studio IDE by clicking on the Go to Data Lab button available in the details page.

The following details are visible at the details page for the studio:

Studio Details

Note

If a user does not have access to any of the underlying Amorphic resources attached to the studio, they will not be able to access the URL and the button, and will see an error message indicating the resources which they don't have access to.

  • Users cannot create their own spaces within the studio. Amorphic will create collaborative spaces by default for users to use.
  • A default Jupyter Lab app is created for studio using the Jupyter Lab Instance Type within the studio. This is a collaborative app and multiple users can use this app.
  • If the users wants to update the configuration or stop the Jupyter Lab app, they can use the Stop space button and make the necessary changes.
Note
  • Please notify all the users before stopping the Jupyter Lab app. Users that are working in the space will lose work that is in memory or unsaved. Users will need to refresh their page to learn of the shut down.
  • The Jupyter Lab Instance Lab attribute available in the studio details would be the default instance type using which the Jupyter Lab app is created. If users modify the configuration from the studio, this attribute will not be updated.
  • Currently only Jupyter Lab and RStudio (if enabled) can be consumed from the Studio IDE. Users can also launch these apps directly from the Studio Apps tab available in the studio details page in Amorphic :

Studio Apps

Using RStudio IDE

For using RStudio IDE in Studios, you need to have a valid license provisioned by AWS License Manager. Follow the instructions mentioned in the documentation.

  • There can be two types of users within RStudio - Admins and Users. The user who creates the studio is by default an Admin user.
  • In Amorphic, if the user is provided owner access to the studio, the user would be an RStudio Admin and if the user is provided read-only access to the studio, the user would be an RStudio user.
  • Users can access the dashboard using the Admin Dashboard URL available in the Studio Apps page.
  • Admin users can access a dashboard which provides details such as number of sessions, users and instance utilization, etc.

RStudio IDE

  • The application can be accessed using the App URL available in the Studio Apps page or from the Studio IDE using the User Profile URL.
  • RStudio Sessions can only be created with instances that are specified in the Allowed Instances List in the studio,

Delete Studio

Studio deletion can take up to 10-15 minutes depending on the number of linked users and apps created in the studio.

Update Studio

Users can modify studio metadata, as wells as the attached Amorphic datasets and domains (in read-only or owner mode) to the studio to get access to them inside IDE. Shared resources (parameters, shared libraries, etc.) can also be updated in this manner.

Studio Benefits

  • Amazon SageMaker Studio offers a unified experience for ML development. ML teams can perform the complete ML workflow in a single web-based visual interface.

  • Access to pre-trained ML models, built-in algorithms, and prebuilt ML solutions.

Studio Use cases

  • Unify your end-to-end ML development in SageMaker Studio with the most comprehensive ML tools all in one place. SageMaker offers high-performing MLOps tools to help you automate and standardize ML workflows and governance tools to support transparency and auditability across your organization.

  • Build foundation models faster in SageMaker Studio with access to a wide range of publicly available models, notebooks backed by high performance compute for fine-tuning, and ability to scale to distributed training directly from Studio notebooks.

  • SageMaker Studio offers a unified experience to perform all data analytics and ML workflows. Create, browse, and connect to Amazon EMR clusters. Build, test, and run interactive data preparation and analytics applications with Amazon Glue interactive sessions. Monitor and debug Spark jobs using familiar tools such as Spark UI – all right from SageMaker Studio notebooks.