登录查看更多内容

Unlocking Enterprise Image Descriptions: Harnessing the Power of Nasuni's S3 Edge API and LLMs for Precision Naming

Jim Liddle

Chief Innovation Officer Data Intelligence and AI

发布日期: 2024年8月29日

In a prior article I showed how we could leverage Llamafile to use a local LLM to rename images with descriptive names to make them easier to find.

Today we are going to do something very similar but this time we will use the Nasuni S3 Edge compatible S3 endpoint (for unstructured file data managed by Nasuni), and Claude's LLM to interrogate the images.

Imagine that a company has directories full of images in which the names are simply letters or number or a combination of both (which they often are). This makes it very difficult for users to find and identify the images without browsing each image.

We will use Claude’s multimodal AI capabilities, via its API, to identify the image and rename it:

Requirements?

Python 3?

S3 Edge configured and available?

Anthropic API key?

Starting Point:?

Image Directory on a Nasuni Edge which given Nasuni provides a compatible S3 API can be interrogated via an S3 compatible clients such as Cyberduck:?

As Nasuni's S3 Edge is compatible with the S3 API we can use the Boto a Python library for interacting with S3. It enables us to perform various operations on S3 buckets and objects.

Code:?

import boto3 

import base64 

import os 

import io 

from botocore.exceptions import ClientError 

from anthropic import Anthropic 
 

# Nasuni S3 edge configuration 

NEA_ADDR = '<enter_address>' 

A_KEY = 'S3_Edge_Key' 

S_KEY = 'S3_Edge_Secret_key' 

BUCKET_NAME = 'enter_bucket_name' 

IMAGE_DIRECTORY = '<image_DIR>'  # Specify the directory (prefix) where images are located   

# Initialize the Anthropic client 

anthropic = Anthropic(api_key='anthropic_api_key') 

# Create an S3 client 

s3_client = boto3.client( 

    's3', 

    endpoint_url=f'https://{NEA_ADDR}/', 

    aws_access_key_id=A_KEY, 

    aws_secret_access_key=S_KEY 

) 

  
def test_connection(): 

    try: 

        response = s3_client.list_buckets() 

        print("Connection successful!") 

        print("Available buckets:") 

        for bucket in response['Buckets']: 

            print(f"  {bucket['Name']}") 

        return True 

    except Exception as e: 

        print(f"Connection failed: {e}") 

        return False 

  
def get_object_from_s3(key): 

    try: 

        response = s3_client.get_object(Bucket=BUCKET_NAME, Key=key) 

        return response['Body'].read() 

    except ClientError as e: 

        print(f"Error reading object from S3: {e}") 

        return None 

  
def sanitize_filename(filename): 

    if filename is None: 

        print("Warning: Filename is None") 

        return "unknown_file" 

    filename = ''.join(c for c in filename if c.isalnum() or c in (' ', '_', '-')) 

    return filename[:50]  # Limit to 50 characters to avoid overly long filenames 

  

def get_image_description(image_data): 

    if image_data is None: 

        print("Error: image_data is None") 

        return None 

    try: 

        encoded_image = base64.b64encode(image_data).decode('utf-8') 

        response = anthropic.messages.create( 

            model="claude-3-opus-20240229", 

            max_tokens=300, 

            messages=[ 

                { 

                    "role": "user", 

                    "content": [ 

                        { 

                            "type": "image", 

                            "source": { 

                                "type": "base64", 

                                "media_type": "image/jpeg", 

                                "data": encoded_image 

                            } 

                        }, 

                        { 

                            "type": "text", 

                            "text": "Describe this image in a brief, concise manner suitable for a filename." 

                        } 

                    ] 

                } 

            ] 

        ) 

        return response.content[0].text 

    except Exception as e: 

        print(f"Error getting description: {e}") 

        return None 

  

def process_image(key): 

    try: 

        print(f"Processing image: {key}") 

        image_data = get_object_from_s3(key) 

        if not image_data: 

            print(f"Failed to get image data for {key}") 

            return False 



        description = get_image_description(image_data) 

        if description: 

            sanitized_description = sanitize_filename(description) 

            new_key = os.path.join(os.path.dirname(key), sanitized_description + os.path.splitext(key)[1]).replace('\\', '/') 
          

            print(f"Uploading object with new name: {new_key}") 

            try: 

                s3_client.put_object(Bucket=BUCKET_NAME, Key=new_key, Body=io.BytesIO(image_data)) 

                print(f"Successfully uploaded: {new_key}") 

            except ClientError as e: 

                print(f"Error uploading new object: {e}") 

                return False 
             

            print(f"Deleting original object: {key}") 

            try: 

                s3_client.delete_object(Bucket=BUCKET_NAME, Key=key) 

                print(f"Successfully deleted: {key}") 

            except ClientError as e: 

                print(f"Error deleting original object: {e}") 

                return False 
             

            print(f'{key} renamed to {new_key}') 

            return True 

        else: 

            print(f'No description returned for {key}') 

            return False 

    except ClientError as e: 

        print(f'An error occurred processing {key}: {e}') 

        return False 


def list_objects(bucket, prefix=''): 

    try: 

        paginator = s3_client.get_paginator('list_objects_v2') 

        page_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/') 
  

        for page in page_iterator: 

            if 'Contents' in page: 

                for obj in page['Contents']: 

                    yield obj 

            if 'CommonPrefixes' in page: 

                for common_prefix in page['CommonPrefixes']: 

                    yield from list_objects(bucket, common_prefix['Prefix']) 

    except ClientError as e: 

        print(f"Error listing objects: {e}") 

  
def main(): 

    if not test_connection(): 

        print("Failed to connect to S3. Exiting.") 

        return 

    processed_count = 0 

    try: 

        print(f"\nListing objects in bucket: {BUCKET_NAME}, directory: {IMAGE_DIRECTORY}") 

        for obj in list_objects(BUCKET_NAME, IMAGE_DIRECTORY): 

            key = obj['Key'] 

            if key.lower().endswith(('.jpg', '.jpeg', '.png', '.gif')): 

                print(f'Processing file: {key}') 

                if process_image(key): 

                    processed_count += 1 

     
        print(f'Completed processing {processed_count} images') 
    

        # Verify the results 

        print("\nVerifying results...") 

        remaining_images = list(list_objects(BUCKET_NAME, IMAGE_DIRECTORY)) 

        print(f"Number of objects remaining in {IMAGE_DIRECTORY}: {len(remaining_images)}") 

        for obj in remaining_images: 

            print(f"  {obj['Key']}") 

    except Exception as e: 

        print(f"An error occurred: {e}") 

        import traceback 

        traceback.print_exc() 


if name == "__main__": 

    main()

Before running the script be sure to add the Nasuni Edge details, the Nasuni (S3) bucket name, the Nasuni (S3) keys, and also your Anthropic Claude API key.

Once this is done we can run the code which interrogates the buckets, finds the images in the directory and sends them to be identified by the multimodal Claude LLM. The images are written back to the S3 bucket and the prior image deleted (as S3 does not have a rename capability).

The Cyberduck Output after the script has been run:

Post image processing listing in CyberDuck

Image example: Lifeguard tower on a beach at sunset.jpg

领英推荐

Open-Source AI vs. Proprietary Models: Pros and Cons -…

Analytics Insight? 1 个月前

Build RAG applications using only APIs with Postman! ??

Clarifai 9 个月前

Agent Protocol to Deploy AI Agents in Production

Unwind AI 3 个月前

In summary, we successfully utilized Claude, a compatible Bedrock AI model, with Nasuni's S3 compatible API to satisfy an image rename use case.

It would be pretty easy to extend this script by also adding richer image metadata tagging (in addition to renaming the image).

So what use cases could this be good for ?

Media and Entertainment:

Content Organization:?

Automate the process of identifying and renaming images in media libraries, making it easier for production teams to search and organize visual assets.

Metadata Enrichment:?

Enhance image metadata by adding descriptive tags and accurate filenames, improving asset discoverability and reusability.

E-Commerce and Retail:

Product Catalog Management:?

Automatically recognize and rename product images based on their content, ensuring consistent and accurate product listings.

Visual Search Optimization:?

Improve the effectiveness of visual search tools by ensuring images are correctly labeled and tagged, enhancing the customer shopping experience.

Marketing and Advertising:

Campaign Asset Management:?

Streamline the management of marketing campaign assets by automatically identifying and renaming images, making it easier to retrieve and deploy visual content.

Brand Consistency:?

Ensure all marketing materials are consistently labeled and organized, maintaining brand integrity across various platforms.

Publishing and Media:

Editorial Asset Management:?

Facilitate the organization of editorial images by automatically identifying and renaming them according to their content, aiding in the efficient production of publications.

Archive Management:?

Improve the management of historical image archives by accurately tagging and renaming images, making it easier to retrieve and use archived content.

Healthcare:

Medical Imaging Management:?

Automate the identification and renaming of medical images (e.g., X-rays, MRIs) based on their content, improving the organization and retrieval of patient records.

There is a lot of room for small AI use cases such as this to provide value within the enterprise.

要查看或添加评论，请登录

Jim Liddle的更多文章

AI and Creative Rights: Finding Balance in the Age of AI

2025年2月25日

AI and Creative Rights: Finding Balance in the Age of AI

In a show of unity, the UK's creative industries are pushing back against UK government proposals that would allow AI…
PrivateLLMLens: A Zero-Server Web Interface for Local Ollama LLMs

2025年2月16日

PrivateLLMLens: A Zero-Server Web Interface for Local Ollama LLMs

I've been working on and off in my spare time on a simplistic web interface for Ollama. I've written about Ollama…

1 条评论
The Emperor Has No Clothes: Security In The Age Of Deepfakes

2025年2月13日

The Emperor Has No Clothes: Security In The Age Of Deepfakes

In the realm of cybersecurity, we're living through our own version of Hans Christian Andersen's "The Emperor's New…

1 条评论
2025 Enterprise AI Predictions

2025年1月6日

2025 Enterprise AI Predictions

AI is moving so fast it can make any predictions look (very) foolish in hindsight but nevertheless here is my stake in…
Local AI vision processing with Ollama and Meta's Llama vision model

2024年11月11日

Local AI vision processing with Ollama and Meta's Llama vision model

Ollama now supports Meta's 3.2 vision 11b and 90b vision models which means that it can potentially facilitate some…
The 'What Not How' Revolution: How Generative AI is Redefining Technical Expertise

2024年11月8日

The 'What Not How' Revolution: How Generative AI is Redefining Technical Expertise

In the world of enterprise software development, we're bearing witness to a seismic shift that few could have predicted…

6 条评论
What exactly do we mean by LLM Reasoning ?

2024年10月15日

What exactly do we mean by LLM Reasoning ?

My ex-colleague Markus Warg kindly sent me an X thread on a recent paper by Apple that provides compelling evidence…

4 条评论
Elastic Serverless, Search, RAG & Nasuni Oh My !

2024年7月5日

Elastic Serverless, Search, RAG & Nasuni Oh My !

I thought it would be fun to take a look at Elastic Serverless in the context of indexing a small Nasuni files…

2 条评论
Tackling a small AI use case with a local multi-modal LLM

2024年6月26日

Tackling a small AI use case with a local multi-modal LLM

For the enterprise AI is really all about business process automation. Local LLM’s can be really useful to automate…

5 条评论
Why a data management strategy underpins a successful AI strategy

2024年3月1日

Why a data management strategy underpins a successful AI strategy

AI promises tremendous benefits for the enterprise - from predictive insights to automated tasks. However, the success…

4 条评论

See all articles

Unlocking Enterprise Image Descriptions: Harnessing the Power of Nasuni's S3 Edge API and LLMs for Precision Naming

Jim Liddle

Chief Innovation Officer Data Intelligence and AI

领英推荐

Jim Liddle的更多文章

社区洞察

其他会员也浏览了

RAG Pipelines with Visual Embeddings

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

Why you might be using OpenAI o1 the Wrong Way:

Cracking GenAI for Enterprise Data: The Databricks Approach

Strengthening AI Supply Chains: Lessons from CVE-2024-34359

In-situ Federated Data Processing | ML4ALL | Apache Wayang (incubating)

AI-Powered Search: Embedding-Based Retrieval and Retrieval-Augmented Generation (RAG)

?? GraphRAG's Biggest Problem Solved

Transforming AI Efficiency: The Power of Markdown in HTML AI Website Scraping

Understanding the RAG Pipeline: Components and Hyperparameters

领英推荐

Jim Liddle的更多文章

AI and Creative Rights: Finding Balance in the Age of AI

PrivateLLMLens: A Zero-Server Web Interface for Local Ollama LLMs

The Emperor Has No Clothes: Security In The Age Of Deepfakes

2025 Enterprise AI Predictions

Local AI vision processing with Ollama and Meta's Llama vision model

The 'What Not How' Revolution: How Generative AI is Redefining Technical Expertise

What exactly do we mean by LLM Reasoning ?

Elastic Serverless, Search, RAG & Nasuni Oh My !

Tackling a small AI use case with a local multi-modal LLM

Why a data management strategy underpins a successful AI strategy

社区洞察

其他会员也浏览了

RAG Pipelines with Visual Embeddings

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

Why you might be using OpenAI o1 the Wrong Way:

Cracking GenAI for Enterprise Data: The Databricks Approach

Strengthening AI Supply Chains: Lessons from CVE-2024-34359

In-situ Federated Data Processing | ML4ALL | Apache Wayang (incubating)

AI-Powered Search: Embedding-Based Retrieval and Retrieval-Augmented Generation (RAG)

?? GraphRAG's Biggest Problem Solved

Transforming AI Efficiency: The Power of Markdown in HTML AI Website Scraping

Understanding the RAG Pipeline: Components and Hyperparameters