Oracle Database 23ai: Oracle Analytics Cloud, Machine Learning, Anomaly Detection, Generative AI for Environment Data Analysis and Retail Industries.
In this part 3 article on the Oracle Database 23ai series, we will see that Data is a Gold mine and how to analyse data with Oracle Machine Learning & Oracle Analytics Cloud. We will leverage Oracle Generative AI to understand what is impacting the environment globally. We will drill through data related to carbon emissions, nitrogen emissions, greenhouse gas impacts, water abstractions, deforestation, etc. We will also analyse retail customer data and create Anomaly Detection machine learning models to segment the demographic data into clusters and examine the customer data anomalies.?
In article part 1, we saw how to create Oracle Database 23ai infrastructure; in part 2, we saw how to search Internal knowledge repository through retrieval augmentation generation, AI Vector Search and Generative AI.?
My previous articles related to this thread are:
In this article let us look into?the following:
1. Look into the environment and climate changes.?
Carbon?dioxide (CO2) emissions come from burning fossil fuels and making cement.?These emissions also include carbon dioxide produced when consuming solid, liquid, and gas fuels and gas flaring. Carbon dioxide emissions are the primary driver of global climate change.?Preventing or reducing CO2 emissions is very important to avoid the worst impacts of climate change.??
Nitrogen dioxide (NO2) is?a reddish-brown, acidic, corrosive, and highly reactive gas emitted from the combustion of fossil fuels, such as coal, gas, and oil.?It can also be produced when nitric oxide (NO) reacts with other chemicals in the air, such as ozone, in the presence of sunlight.
Deforestation is the intentional removal of trees and forests, which releases carbon dioxide into the atmosphere and contributes to climate change. Carbon dioxide is a greenhouse gas that traps heat in the lower atmosphere, which causes global warming. Deforestation is the second largest source of carbon dioxide emissions from human activities, after burning fossil fuels, and is responsible for about 12–20% of global greenhouse gas emissions
2. Create Oracle APEX Application and access Database.?
Log in to?Oracle APEX?and create an application. If you have not yet created an Oracle APEX environment, please refer to?the article in part 1?of this series.
Access SQL commands under SQL Workshop.
Alternatively, you can?skip Oracle APEX entirely and access the database after SSHing it into your compute instance and connecting it to the database as the docuser created in the?first article?of this series.???
keys % ssh -i <your-private-key>.key opc@<your-public-ip>
sqlplus docuser/<db-password>@localhost:1521/FREEPDB1
Alternatively, Install SQL Developer VSCode plugin
Create a Database Connection where the hostname will be your public IP address. Save, Connect and Open worksheet to run SQL commands.
3. Create Environment schema and import data.
Download?env_forest_area.csv?dataset from my GitHub repo. This is curated and simplified version of the original source at?Global Forest Resource Assessment?and?Global Forest Resources Assessment 2020
Under?SQL Workshop?>?Data Workshop, Upload the downloaded CSV file here.
On the Oracle APEX page, create a region of type chart. The table name is the ENV_FOREST_AREA that we just created
Under Series we can choose country and year as shown below.
Create a Select List of Items for countries. and Create a dynamic action under this item for an onChange event to refresh the chart
select country as v, country as d from ENV_FOREST_AREA order by country
Similarly, create other series and choose different years from the year column of the database table.?So?our chart should look like this for all years?when we run the page.
4. Create Generative AI for Forestation related information.
This section is Optional, you can do step 4 or skip it.
Create a Generative AI Function, as shown in?this code base. Invoke the Generative AI function from the Oracle APEX PL/SQL region, as shown below. You can re-engineer the prompts for different inputs as required.
Please ensure that OCI Generative AI Credentials "OCI_CRED" has been setup as shown in part 2 article
DECLARE
-- Get Country Name
v_input varchar2(400) := :P1_COUNTRY;
v_n varchar2(4000);
BEGIN
v_input := 'Forestation in '||v_input;
v_n := genai_function ( v_input );
END;
Apply filter by country name from year 1990, 2000, 2010, 2015, 2020
Brazil once had the highest deforestation rate in the world and in 2005 still had the largest area of forest removed annually.Since 1970, over 700,000 square kilometres (270,000 sq mi) of the Amazon rainforest have been destroyed. In 2001, the Amazon was approximately 5,400,000 square kilometres (2,100,000 sq mi), which is only 87% of the Amazon's original size. According to official data, about 729,000 km2 have already been deforested in the Amazon biome, which corresponds to 17% of the total. 300,000 km2 have been deforested in the last 20 years.
Currently Ireland has a total forest cover of approximately 1,789,000 acres equivalent to 10.5% of the national area. As of 2023, Ireland has 11.6% forest cover, which is one of the lowest rates in Europe. This is a significant decrease from thousands of years ago, when more than 80% of the island was covered in trees. By 1925, only 1% of Ireland was forested. (source - wiki)
Lets now move from Oracle APEX based reporting/charts to Oracle Analytics cloud based Data Visualisation to get more insight, without writing any SQL queries to do this.
5. Analyse historical environmental data such as carbon, nitrogen, and greenhouse gas emissions?in Oracle APEX.
Let us now import few more datasets shown below into Oracle Database 23ai, the process to create table from CSV file uploaded is shown in step 3.
Confirm if the data is properly imported, You can do this under SQL Workshop > Object Browser
6. Connect Oracle Database 23ai with Oracle Analytics Cloud
Login to Oracle Analytics Cloud, the URL will be as shown below.
https://<your-instance-name>-<your-tenancy-name>-<region-identifier>.analytics.ocp.oraclecloud.com/ui/dv/?pageid=console
If you do have an Oracle Analytics Cloud instance, you can create it by logging into?cloud.oracle.com
from here click on Analytics Home Page button.
On the left Navigation, click on Console. And click on Create Database Connection to our Oracle Database 23ai. Test and Save connection. Where host is a public IP address, enter the Database username and password (use the same schema for which we created tables in Oracle APEX)
You can also create a connection under the Data tab in the left navigation
From the top Navigation, click on Create next to 3 dots.?
Select our Oracle Database 23ai connection
We will be able to browse through tables.
Select the table of our choice; in this case, select SO2_EMISSION (or SOX_EMISSION), save the Dataset as SOX_DS in the top navigation, and click the back navigation button to save this.
7. Create Reports and Auto Generate Data Insights in Oracle Analytics Cloud.?
We can see the new Dataset just created. Double click on it.
Click on the small bulb icon on top to view Auto Insights; add this on the design canvas, and here we can see the Top 10 Sulfur oxide (SOx) emissions countries
Working with Data Visualisation and Oracle Analytics Cloud is an Art. You can create endless?types of?reports based on various database columns; here, I.?Here,?I have selected the four best visualizations for my analysis: feel.?Feel free to experiment with more types of Visualizations.
领英推荐
8. Adding Location Map to Visualisation
Let us add one more Canvas to this and select Country and COx emission
You can adjust the properties of Map or any other chart as shown below,
In the top Navigation, you can see?the Save?button. Click on it and?Present.?
Other environment reports generated are shown towards the end of this article.?
We will now move towards the next section, the next?industry.
9. Oracle Machine Learning Algorithms for Anomaly Detection
Expectation Maximization (EM) is a density estimation technique. Oracle Machine Learning implements EM as a distribution-based clustering algorithm that uses probability density estimation. In density estimation, the goal is to construct a density function that captures how a given population is distributed. The density estimate is based on observed data that represents a sample of the population.
Anomaly Detection (AD) identifies cases that are unusual within data that is apparently homogeneous. Anomaly Detection is an important tool for fraud detection, network intrusion, and other rare events that may have great significance but are hard to find.
Oracle Machine Learning uses Support Vector Machine (SVM) as the one-class classifier for Anomaly Detection (AD). When SVM is used for anomaly detection, it has the classification mining function but no target.
There are two ways to search for anomalies:
10. Retail Scenario for Anomaly Detection
Problem statement: Segment the demographic data into clusters and examine the anomalies.
Demographic segmentation groups customers and potential customers together by focusing on certain traits such as age, gender, income, occupation & family status. EM can capture the underlying data distribution and thus flag records that do not fit the learned data distribution well. An object is identified as an outlier in an EM Anomaly Detection model if its anomaly probability exceeds 0.5. The customer and demographics data is used to predict anomalous customers using prob anomalous.
11. Create a Machine Learning user and grant the required resources.
Since we already have created a docuser, who also has Oracle APEX workspace we will re-use this database user for Machine learning, if you would like to know how to create database user, please refer?part 1 article?of this series.
Connect to Oracle Database 23ai as?SYS?user.
# sqlplus sys/<your-password>@localhost:1521/FREEPDB1 as sysdba
SQL> GRANT CREATE MINING MODEL TO docuser;
SQL> GRANT EXECUTE ON CTXSYS.CTX_DDL TO docuser;
SQL> GRANT SELECT ANY MINING MODEL TO docuser;
-- add any other grants if required --
Now login as docuser, download the csv files and import following tables.
12. Create Machine Learning Model to detect anomalies in retail customer data.
Step 1:?Create views from the Supplementary Demographics and Customers table.
CREATE OR REPLACE VIEW DEMOGRAPHICS_V AS
SELECT CUST_ID, EDUCATION, OCCUPATION, AFFINITY_CARD, HOUSEHOLD_SIZE, YRS_RESIDENCE, CRICKET FROM SUPPLEMENTARY_DEMOGRAPHICS;
CREATE OR REPLACE VIEW CUSTOMERS360_V AS
SELECT a.CUST_ID, a.CUST_GENDER, a.CUST_MARITAL_STATUS, a.CUST_YEAR_OF_BIRTH, a.CUST_INCOME_LEVEL, a.CUST_CREDIT_LIMIT, a.CUST_FIRST_NAME, a.CUST_LAST_NAME,
b.EDUCATION, b.AFFINITY_CARD, b.HOUSEHOLD_SIZE, b.OCCUPATION, b.YRS_RESIDENCE, b.CRICKET FROM CUSTOMERS a, DEMOGRAPHICS_V b
WHERE a.CUST_ID = b.CUST_ID;
Step 2: Create EM model with CREATE MODEL2
The CREATE_MODEL2 procedure of the DBMS_DATA_MINING package is a procedure for defining model settings to build a model. By using the CREATE_MODEL2 procedure, the user does not need to create transient database objects. The model can use configuration settings and user-specified transformations. In the CREATE_MODEL2 procedure, the input is a table or a view and if such an object is not already present, the user must create it.
-- Drop Model if it already exists
BEGIN DBMS_DATA_MINING.DROP_MODEL('CUSTOMERS360MODEL_AD');
EXCEPTION WHEN OTHERS THEN NULL; END;
-- Create Model by name CUSTOMERS360MODEL_AD
DECLARE
v_setlst DBMS_DATA_MINING.SETTING_LIST;
BEGIN
v_setlst('ALGO_NAME') := 'ALGO_EXPECTATION_MAXIMIZATION';
v_setlst('PREP_AUTO') := 'ON';
v_setlst('EMCS_OUTLIER_RATE') := '0.1';
DBMS_DATA_MINING.CREATE_MODEL2(
MODEL_NAME => 'CUSTOMERS360MODEL_AD',
MINING_FUNCTION => 'CLASSIFICATION',
DATA_QUERY => 'SELECT * FROM CUSTOMERS360_V',
CASE_ID_COLUMN_NAME => 'CUST_ID',
SET_LIST => v_setlst,
TARGET_COLUMN_NAME => NULL); -- NULL target indicates anomaly detection
END;
/
SELECT * FROM TABLE(dbms_data_mining.get_model_details_global('CUSTOMERS360MODEL_AD'))
ORDER BY global_detail_name;
Step 3:?Display the Top 5 Anomalous customers and create a table view
CREATE OR REPLACE VIEW top_5_anomalous_customers AS
SELECT *
FROM (SELECT CUST_ID, round(prob_anomalous,2) prob_anomalous,
YRS_RESIDENCE, CUST_MARITAL_STATUS, CUST_GENDER, CUST_FIRST_NAME, CUST_LAST_NAME,
CUST_YEAR_OF_BIRTH, CUST_INCOME_LEVEL, CUST_CREDIT_LIMIT,
EDUCATION, AFFINITY_CARD, OCCUPATION,
rank() over (ORDER BY prob_anomalous DESC) rnk
FROM (
SELECT CUST_ID, HOUSEHOLD_SIZE, YRS_RESIDENCE, CUST_GENDER, CUST_MARITAL_STATUS, CUST_FIRST_NAME, CUST_LAST_NAME,
CUST_YEAR_OF_BIRTH, CUST_INCOME_LEVEL, CUST_CREDIT_LIMIT,
EDUCATION, AFFINITY_CARD, OCCUPATION,
prediction_probability(CUSTOMERS360MODEL_AD, '0' USING *) prob_anomalous
FROM CUSTOMERS360_V))
WHERE rnk <= 5
ORDER BY prob_anomalous DESC
13. Create Oracle Analytics Cloud report of anomalous and regular customers.
Create Customers360 View Dataset
Create Top 5 Anomalous Customer Dataset
Create Anomalous Report based on Analytics Insight
Report View
Similarly, Check for regular customer reports.
14. Additional Environment Reports
Water abstractions refer to freshwater taken from ground. or surface water sources, either permanently or. temporarily, and conveyed to the place of use. Groundwater abstraction is the process of extracting water from an underground aquifer, either temporarily or permanently. The water is usually pumped out through wells or boreholes, and the water table lowers around the borehole as it's extracted
Greenhouse gas (GHG) emissions are?gases released into the atmosphere that trap heat and contribute to climate change.?GHGs absorb more prolonged wavelength radiation, which increases atmospheric temperatures.?Some examples of GHGs include carbon dioxide (CO2), nitrogen oxide (N2O), water vapour (HO), and fluorinated gases.
Conclusion:?We saw in this article how to extend our Oracle Database 23ai to run through Environment and Climate change reports related to Carbon emission, Deforestation, Nitrogen emission, Groundwater reduction and much more. We used Oracle Generative AI to get more insights into these environment reports. We also saw how to use Oracle Machine Learning models to detect anomalies in retail customer data. We imported these models and data into the Oracle Analytics cloud to create various business and executive reports. This creates endless possibilities across different industry verticals such as Healthcare, Retail, Transportation, Environment life science, etc.?
Thank you.
Thanks for reading, liking and sharing
Regards, Madhusudhan Rao
My Related Articles:
References:
GEN AI Evangelist | #TechSherpa | #LiftOthersUp
6 个月Intriguing insights. Let's explore data's power to drive sustainability. Combining analytics and AI could reveal surprising breakthroughs. Madhusudhan Rao