Skip to main content
Version: v3.0 print this page

Libraries

Libraries are an extension of external job libraries. They are mainly used to maintain a central repository of organization-approved libraries/packages to be used across multiple Jobs or Data labs.

These Libraries have the following capabilities:

  • They allow users to have multiple packages attached to a job, so they can easily switch between them to perform various actions based on the job requirements.
  • They provide the ability to customize job dependencies to a granular level.
  • They offer flexibility to choose among the different type of packages.
Note

Currently based on the type of ETL Job, Amorphic supports "py", "egg" and "whl" extensions for python shell applications and "py", "zip", "jar" for pyspark applications.

Libraries

Library

A Library is a collection of packages/modules that provides a standardized solution for problems in everyday programming. Unlike the OS-provided python supporting the collection, the packages are explicitly designed by User/Organization/Open-Source Community. This encourages and enhances the portability of Python programs by abstracting away the platform-specific APIs into platform-neutral APIs.

The ETL Library has the following properties:

  • A Library can have multiple packages attached to it.
  • A Library can be attached to multiple Jobs.

Types of Amorphic ETL Libraries:

  • External Libraries: Their scope is within the ETL job, and they get removed when user deletes the ETL job.
  • Shared Libraries: They possess a universal scope, allowing multiple jobs to utilize the same shared library upon user authentication, and persist in the central repository even after the ETL job has been deleted.

Amorphic Libraries contain the following information:

TypeDescription
Library NameUniquely identifies the functionality of the library
Library DescriptionA brief explanation of the library typically the contents/package inside it
PackagesIt is a file or a list of files that can be imported into an ETL Job to perform a specific set of operations. Example: matplotlib is A numerical plotting library used by any data scientist or any data analyzer for visualizations
JobsThe list of ETL jobs to which the library is attached
CreatedByUser who created the library.
LastModifiedByUser who has recently updated the library.
LastModifiedTimeTimestamp when the library was recently updated.

Libraries Operations

Amorphic libraries provides the following operations to manage the libraries:

Create Library

To create a new Library in Amorphic, go to the "Create New Library" section under the "Libraries". The application allows libraries to have zero or more packages/jobs attached to it. After creating the Library user can view, update, & delete it. User can only do these operations if permissions to access the libraries is present on them.

note

Users cannot delete a shared library if it is attached to the existing ETL Job. So, when attempted to delete such a library, user will be notified with the list of dependent ETL Jobs with a pop-up. Then, user should remove all the libraries used in Jobs and retry to delete the library.

The below gif shows how a user can create a new library.

Create ETL Library

View Library

To view all the existing library information user must have sufficient permissions. Click the Library name under the "Libraries" section inside the Shared Resources scetion to view the library.

Take a look at how user can view the library information in detail

View library

Attach Library

User can attach a library from the job details page and attach a shared library to a job while creating or updating it. Amorphic provides a list of shared libraries along with other job parameters, which user can then attach to the job. Once attached all the packages in the shared library are passed as arguments to the job automatically without any intervention.

Follow the below gif to attach a shared ETL library to an existing ETL Job.

Attach Library

Importing and using a library

If user has a library with a single version of the required module or multiple different files added in this single library, then they can import the module and use it.

Python
from amorphicutils.common import read_param_store
print(read_param_store("SYSTEM.S3BUCKET.DLZ", secure=False)['data'])

If users have a library with a multiple version of the required module , then they should explicitly insert into the system path the versioned file and then import the module and use it. This ensures it allows picking up the specific version of the library and not a random one.

Python
import sys
# explicitly specify the version user want to use
sys.path.insert(0, "amorphicutils-0.3.1.zip")
from amorphicutils.common import read_param_store
print(read_param_store("SYSTEM.S3BUCKET.DLZ", secure=False)['data'])