Use Glue Sessions in Amorphic Studio
Users can leverage Glue Sessions in AWS Glue Studio using Jupyter Lab with Glue PySpark kernel to streamline data processing and transformation workflows
Usage
-
Create a Datalab Studio: Follow the detailed instructions on how to Create a Studio to set up your Datalab environment.
-
Copy the Studio ID: Once your studio is created, make sure to copy the unique studio ID for future reference.
-
Locate the User Profile Role: In the AWS console backend, search for the user profile role associated with your Datalab studio. The format for the user profile role is:
{ProjectShortName}-custom-{studio_id}-usr-Role
.
- Update Trust Relationship: Ensure that you add
glue.amazonaws.com
to the Trust relationship of the user profile role to allow Glue services to assume the role.
-
Modify Inline Policy: Update the custom inline policy to include the necessary permissions for Glue operations. Add the following statement in the inline policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueSessionPermissions",
"Effect": "Allow",
"Action": [
"glue:RunStatement",
"glue:GetStatement",
"glue:ListStatements",
"glue:CancelStatement",
"glue:StopSession",
"glue:DeleteSession",
"glue:GetSession",
"glue:CreateSession",
"glue:ListSessions",
"glue:TagResource",
"glue:UntagResource"
],
"Resource": "*"
}
]
} -
Create a JupyterLab Notebook: Open the studio and create a new JupyterLab notebook to start your data processing tasks.
-
Launch Glue PySpark Kernel: Finally, launch the Glue PySpark kernel within your JupyterLab notebook to begin utilizing Glue Sessions for your data workflows.
Note: After the 3.0.1 release, Glue Sessions will be available in studios without any additional setup required. For a comprehensive guide on how to use Glue Sessions, refer to the provided link.