Unleashing the Power of Amazon Kendra: Indexing and Vectorizing RDS (MSSQL) Databases

Unleashing the Power of Amazon Kendra: Indexing and Vectorizing RDS (MSSQL) Databases

In today's data-driven world, organizations are bombarded with an ever-increasing volume of unstructured data, such as documents, emails, and multimedia files. Effectively managing and extracting valuable insights from this vast trove of information can be a daunting challenge. This is where Amazon Kendra, a highly accurate and easy-to-use enterprise search service powered by machine learning, comes into play.

?Amazon Kendra leverages advanced natural language processing (NLP) techniques, including indexing and vectorization, to intelligently understand and organize unstructured data, enabling users to find the information quickly and efficiently they need. Indexing is the process of creating a searchable repository of documents, while vectorization involves representing textual data as high-dimensional vectors, enabling Kendra to understand the semantic relationships between words, phrases, and documents.

?Whether you're a data scientist, a software engineer, or a business professional seeking to harness the power of enterprise search, this blog post will equip you with the knowledge and tools to leverage Amazon Kendra effectively, empowering you to make better-informed decisions and drive innovation within your organization.

?

So, what is Amazon Kendra?

Amazon Kendra is an intelligent search service that uses natural language processing and advanced machine learning algorithms to return specific answers to search questions from your data. Unlike traditional keyword-based search, Amazon Kendra uses its semantic and contextual understanding capabilities to decide whether a document is relevant to a search query. It returns specific answers to questions, giving users an experience that's close to interacting with a human expert.

?One of the biggest advantage of using Kendra is it handles both indexing and vectorization, providing built-in vectorization capabilities. This ensures that Kendra not only organizes data efficiently but also understands and retrieves information based on its contextual relevance.

Through this blog post, we will understand how to create a Kendra Index and attach data sources in Kendra Index. Assume you already have an RDS (MSSQL) database which you want to index.

?

Step 1: Creating a Kendra Index in AWS Console

?

1. Login into your AWS Account and Search for Amazon Kendra in Console:

?? - Once the Amazon Kendra page opens, you can see an option called “Create an Index” for creating an index. Click on "Create an Index".


2. Give a name and Description for your Index:

?? - For IAM Role, choose the “Create a new role” option in the Dropdown. Give a Role name. Your role name will be prefixed with 'AmazonKendra-us-east-1-'.


?? - Attach Tags if you are interested in, and then click on "Next".

?? - Leave the default settings on this page. If you want to enable token-based access control to your indexed documents, then choose “Yes” in Access Control Settings.

?? - For the User-group expansion, choose according to your needs. Click on "Next".

3. Choose the Kendra Edition:

?? - Amazon Kendra comes with two editions:

???? - Developer Edition: Provides 10,000 documents, supports up to 4,000 queries per day, and runs in 1 availability zone (AZ). Use this edition to test Amazon Kendra's capabilities and to build a proof-of-concept application in a development environment.

???? - Enterprise Edition: Provides 100,000 documents, supports up to 8,000 queries per day, and runs in 3 availability zones (AZ). Use this edition for your production applications. You can add storage and query units as needed.

?? - Choose the one which is most appropriate for you. Click on "Next".

?? - For more information about Kendra Editions, you can refer to this Documentation Pricing information.

?? - Review all the details and click on "Create".

?

Step 2: Attaching a Data Source

?Now that you have created an Index, let us attach a Data source to it. In our case, we are attaching an RDS database as a data source. You can choose the appropriate data source as per your needs. Let us get started.

?1. Navigate to the Indexes page through the left navigation pane:

?? - You can see the index you created just now. Click on that index.


?? - As the Index gets open, you can see the “Add Data Source” option. Click on "Add data source".



2. Choose the Data Source Type:

?? - There are numerous data source options available. Let us use “Amazon RDS Microsoft SQL Server connector”.


?? - Provide a name and description for your Data source name. Leave the default settings as is. Attach a tag if you want to and click on "Next".


3. Configure Database Details:

?? - Host: Your RDS DB Endpoint (You can find it in RDS DB instance’s Connectivity and Security options).

? - Port: 1433. (You can find it in RDS Configuration)

- Instance: Your database name (You can find it in the MSSQL server DB. Once you log in to your RDS instance, navigate to databases, and the name you have provided for the database inside RDS is your instance).



4. Set Up Authentication:

?? - In Authentication, click on “Create and Add new secret”. Give your secret a name. The prefix 'AmazonKendra-Amazon-RDS-Microsoft-SQL-Server-' is added for you.

?? - Enter the Database Username (You can find this in the RDS configuration section) and Password (Remember to provide the username and password that you gave for MSSQL server while logging in).


5. Configure VPC and Security Groups:

?? - Configure VPC and security groups according to your need. Let us leave it as default for now. The best practice would be to configure it to a private VPC.


6. Set IAM Role:

?? - In IAM Role, click on “Use Existing Role” and select the IAM role that you created while creating an Index. (Refer above Image).

?

7. Configure Sync Settings:

?? - Write a query for the table you want to index. Write a Query saying SELECT * FROM <YOUR-TABLE-NAME>. Remember not to put a semicolon at the end, as it would result in an error.


?? - Provide the Primary key, title, and body. Remember to provide the column name which doesn’t exceed 10000 characters limit and in the BODY section, provide the column name which you want to index. If you want to index multiple columns, we will create a separate data source for each column.

?? - In Sync mode, it is recommended to go with Full sync. You can change this based on your needs.


?? - In Sync Run Schedule section, it is again recommended to go with “Run on Demand” option. Click on "Next".

?

8. Configure Facet Definitions:

?? - Navigate to Facet definition on your left navigation pane. In the Facet table, add the column names and their appropriate data type. Choose options among Facetable, Searchable, Displayable, Sortable based on your needs for the columns you added in Facet Table.

?? - For example, if you’re indexing the column name called “AppID”, make sure to add the AppID in the Facet Table. Similarly, add all the columns in the facet table which you want to index.


9. Set Field Mapping:

?? - In the data source configuration, in the Set Field Mapping section, click on the Add field and add the field which you created a facet for your column. Leave the rest as is. Then, click on "Next".

- Let's consider a scenario. In the image below, we are indexing the 'Text' column. In the future, we aim to retrieve contents from the 'Text' column by specifying 'AppID' and 'RunID'. To achieve this, when indexing the 'Text' column, we need to include the 'AppID' and 'RunID' fields in the field settings. Failing to specify 'AppID' and 'RunID' during indexing may result in unsuccessful retrievals based on these parameters.


10. Review and Create:

?? - Review the configurations for your data source, and then click on "Add Data Source".


Step 3: Syncing the Data Source

?

1. Sync Now:

?? - Once the Data source is created, you can see the “Sync Now” option. Add the other columns as data sources in the Index as per your needs.


2. Configure IAM Roles:

?? - Navigate to IAM and then click on Roles in the left navigation panel. Click on the Role which you have created for Kendra. In the Permissions section, click on Add permissions. Provide full access for all services to avoid any errors during Sync.



3. Run the Sync:

?? - Now, everything is set. You can Sync your database.

?? - Once your Sync is successfully completed, you can see “Completed” in the Sync run History section. You can see the number of files scanned, the number of files added, deleted, and failed. If your sync has failed, you can click on the failed files, which will take you to CloudWatch logs for diagnosis. For troubleshooting, refer to the official Troubleshooting Documentation provided by Amazon Troubleshooting.



4. Testing:

?? - If the Sync is successful, you can test it in “Search Indexed Content” in the left navigation pane.


Similarly, you can create up to 5 Indexes per account. You can create a data source based on your requirements among those number of options.

?Amazon Kendra's sophisticated approach to indexing and vectorization sets it apart as a revolutionary tool in the realm of information retrieval. By leveraging advanced machine learning algorithms, Kendra not only indexes data efficiently but also understands the nuances of human language, enabling it to provide accurate and contextually relevant search results.

?As we've explored in this blog, the processes of indexing and vectorization are crucial to Kendra's ability to deliver high-quality search experiences. These capabilities allow organizations to unlock the full potential of their data, transforming the way they access and utilize information.

?Whether you're looking to improve internal knowledge management, enhance customer support with precise information retrieval, or streamline access to extensive documentation, Amazon Kendra offers a powerful solution tailored to meet your needs. By integrating Kendra into your systems, you can ensure that your team and users can quickly find the information they need, boosting productivity and decision-making.

?As you embark on your journey with Amazon Kendra, remember that understanding the underlying technology is key to maximizing its benefits. We hope this blog has provided you with valuable insights into the world of indexing and vectorization, and how these processes drive Kendra's exceptional search capabilities.

?Special thanks to Praveen Desai , Darshan Raviprakash , and Liam Rundle for their invaluable assistance in understanding and implementing the Kendra Index. Your support and encouragement were instrumental in the creation and publication of this blog.

?Stay tuned for more in-depth articles and tutorials as we continue to explore the fascinating features of Amazon Kendra and how it can transform your organization's approach to information retrieval.


Thank you, and Happy Learning!


Nidhi Nayak

Associate software engineer @tech mahindra

7 个月

Interesting, keep going Abhishek

Andrea Bureca

AI/ML UKI Leader @ AWS

7 个月

Brad Mallard John Bolger look at this! Well done Abhishek fantastic blog and how-to :)

要查看或添加评论,请登录

Abhishek Kulkarni的更多文章

社区洞察

其他会员也浏览了