AI-Based Automated Data Extraction for Product Listing

Background:

A leading beauty and personal care e-commerce platform, aimed to automate ingredient extraction from product images using Generative AI. The objective was to streamline product listing, improve accuracy, and reduce manual effort in extracting key ingredient details and their benefits.


To achieve this, Oneture Technologies developed an AI-powered solution leveraging Amazon SageMaker, which automatically extracts, validates, and integrates ingredient information into product management system.

Challenges Faced:

Current manual process of extracting ingredient details from product images was time-consuming, error-prone, and lacked scalability. With a high volume of product listings, the company needed an automated AI-driven solution to ensure accuracy, efficiency, and compliance.

  • Manual Data Entry Issues: Extracting ingredient details manually from product packaging was time-consuming and error-prone.
  • Scalability Concerns: Client handles a high volume of product listings, requiring an automated, high-throughput solution.
  • Data Compliance & Security: Ensuring data privacy while processing product information was a critical requirement.

    Objectives:

    The key objectives of the project were:

    • Automate ingredient extraction from product images using AI.
    • Improve accuracy of product listings by reducing human errors.
    • Reduce manual effort and enhance operational efficiency.
    • Ensure compliance with data privacy and security standards.
    • Seamlessly integrate AI-extracted data into product management system.

    Solution Summary:

    Oneture Technologies developed an AI-powered ingredient extraction system leveraging Amazon SageMaker and Generative AI. The solution automates text extraction from product images, validates ingredient details, and seamlessly integrates data into product management system.

    Key Functionalities:

    • AI-Driven Ingredient Extraction: The AI model automatically detects ingredient text from product images.
    • NLP-Based Benefit Identification: Extracts relevant benefits associated with each ingredient using Generative AI models.
    • Automated Validation & Accuracy Checks: Ensures high confidence levels before adding extracted data to the product catalog.
    • Seamless Integration with client's System: Provides API-based data transfer to clieint's product database.
    • User Interface for Manual Verification: Enables human review for edge cases and quality assurance.

    Key features included:

    The AI-powered system integrates Generative AI and NLP to automate ingredient extraction, ensuring high accuracy and efficiency. The solution is designed for seamless scalability, real-time data processing, and secure integration with product management system.

    • AI-Powered Ingredient Recognition – Uses Generative AI and NLP to extract ingredient details and their associated benefits from product images with high accuracy.
    • Context-Aware Data Extraction – Goes beyond basic text recognition by understanding product descriptions and intelligently associating ingredients with their benefits.
    • Automated Quality Validation – AI-driven validation checks ensure extracted ingredient details are accurate, reducing the need for manual corrections.
    • Smart Categorization & Tagging – Classifies extracted ingredients into predefined categories, making product data more searchable and structured.
    • User-Friendly Manual Review Interface – Allows client's team to verify and refine extracted data through a simple yet efficient UI for edge cases.

    Tech Stack

    • Amazon SageMaker-powered AI Models for real-time data extraction.
    • Generative AI & NLP-based text recognition for extracting ingredient details.
    • AWS Lambda & API Gateway for automated data processing.
    • Cloud-based Data Storage (Amazon S3) for secure image handling.
    • Custom API Integration with Nykaa’s product management system.