ML.NET & C# => More Easier than Walk.

ML.NET & C# => More Easier than Walk.

Last week I spoke with a colleague about Machine Learning and how we can try to implement it in one of our client's projects, and that was when this article was thought ;).

I created a straightforward project where you will find a complete solution already implemented, you can download it from: https://github.com/ljscodex/MLForeCastingTemperature

However, the Article's idea is to start a project from scratch to show you how the process is.

Let's start working.

You can use whatever kind of project you want, I chose a simple ASP.NET Core Web API project template.

Before continuing, We are going to use two packages in this example:

Microsoft.Extensions.ML and Microsoft.ML.TimeSeries, so you can find them on the Nuget Package Admin Tool or you can use the terminal executing dotnet add package {PackageName}.

Something important, this example uses the Time Series model because we are going to try to predict future values based on previous information.

Just because we wanna keep this article simple. let's add this code at the end of Program.cs file


        public WeatherByDateForeCast PredictTemperature(string Station = null)
        {
            var context = new MLContext();

            var filename = @$"{AppContext.BaseDirectory}\Data\newDatos temperatura Argentina.csv";
            var data = context.Data.LoadFromTextFile<WeatherByDate>(filename,
                hasHeader: true, separatorChar: ';' );


            int iFilelineCount = File.ReadLines(filename).Count();

            var pipeline = context.Forecasting.ForecastBySsa(
                nameof(WeatherByDateForeCast.ForecastTemp),
                nameof(WeatherByDate.TempMax),
                windowSize: 5,
                seriesLength: iFilelineCount,
                trainSize: iFilelineCount,
                horizon: 5,
                confidenceLevel: 0.99f
                //confidenceLowerBoundColumn: "ForeCastLowestValue",
                //confidenceUpperBoundColumn: "ForeCastUpperValue"
                );
           
            Microsoft.ML.IDataView filteredData;

            if (Station is not null)
            {
                filteredData = context.Data.FilterByCustomPredicate<WeatherByDate>(data, (x) => x.Station.Trim().ToLower() != Station.ToLower());
            }
            else { filteredData = data; }

            // here you can preview information, it can be removed.
            var preview = filteredData.Preview();

            var model = pipeline.Fit(filteredData);

            var forecastingEngine = model.CreateTimeSeriesEngine<WeatherByDate, WeatherByDateForeCast>(context);


            var forecasts = forecastingEngine.Predict();
                    

            return forecasts;
        }

    public class WeatherByDate
        {
            [LoadColumn(0)]
            public float Date { get; set; }

            [LoadColumn(1)]
            public float TempMax { get; set; }

            [LoadColumn(2)]
            public float TempMin { get; set; }

            [LoadColumn(3)]
            public string Station { get; set; }

        }

        public class WeatherByDateForeCast
        {

            public float[] ForecastTemp { get; set; }
         //   public float[] ForeCastLowestValue { get; set; }
         //   public float[] ForeCastUpperValue { get; set; }
        }        

Let me explain this code breaking down its main components and functionality:

Function Signature:

The method PredictTemperature takes an optional string parameter named Station. It returns an object of type WeatherByDateForeCast, which likely contains forecasted weather data.

MLContext Initialization:

var context = new MLContext();         

initializes a new instance of MLContext which is the starting point for all ML.NET operations.

Loading Data:

The dataset is loaded from a CSV file named newDatos temperatura Argentina.csv into an IDataView object.

var filename = @$"{AppContext.BaseDirectory}\Data\newDatos temperatura Argentina.csv";
            var data = context.Data.LoadFromTextFile<WeatherByDate>(filename,
                hasHeader: true, separatorChar: ';' );        

The LoadFromTextFile method specifies the data schema through the WeatherByDate class and indicates that the file contains a header and uses a semicolon (;) as the separator character.

You can download the file here: https://github.com/ljscodex/MLForeCastingTemperature/blob/master/MLForeCastingTemperature/Data/newDatos%20temperatura%20Argentina.csv

Data Filtering (Optional):

If a Station name is provided, the dataset is filtered to exclude data from this specific station.

   Microsoft.ML.IDataView filteredData;

            if (Station is not null)
            {
                filteredData = context.Data.FilterByCustomPredicate<WeatherByDate>(data, (x) => x.Station.Trim().ToLower() != Station.ToLower());
            }
            else { filteredData = data; }

            // here you can preview information, it can be removed.
            var preview = filteredData.Preview();        

This is achieved using the FilterByCustomPredicate method, which applies a predicate function that removes records where the Station field matches the specified station name.

Forecasting Pipeline:

  int iFilelineCount = File.ReadLines(filename).Count();

            var pipeline = context.Forecasting.ForecastBySsa(
                nameof(WeatherByDateForeCast.ForecastTemp),
                nameof(WeatherByDate.TempMax),
                windowSize: 5,
                seriesLength: iFilelineCount,
                trainSize: iFilelineCount,
                horizon: 5,
                confidenceLevel: 0.99f
                //confidenceLowerBoundColumn: "ForeCastLowestValue",
                //confidenceUpperBoundColumn: "ForeCastUpperValue"
                );        

A forecasting pipeline is created using the ForecastBySsa method, tailored for univariate time-series forecasting.

This method is configured with several parameters:nameof(WeatherByDateForeCast.ForecastTemp) and nameof(WeatherByDate.TempMax) denote the output forecast column and input time-series column, respectively. windowSize: 5 specifies the lagged window size.seriesLength, trainSize, and horizon: 5 control the length of the series considered for training, the size of the training dataset, and the forecast horizon (number of values to forecast), respectively.confidenceLevel: 0.99f sets the confidence level for the prediction intervals.

You can see more details at https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.timeseriescatalog.forecastbyssa?view=ml-dotnet-2.0.0

Training the Model:

           var model = pipeline.Fit(filteredData);        

The Fit method is called on the pipeline with the potentially filtered data to train a forecasting model.

Forecasting:

var forecastingEngine = model.CreateTimeSeriesEngine<WeatherByDate, WeatherByDateForeCast>(context);        

A time series prediction engine is created from the trained model using CreateTimeSeriesEngine.This engine is then used to make forecasts with the Predict method.

Returning Forecast:

var forecasts = forecastingEngine.Predict();
return forecasts;        

The forecast produced by the prediction engine is returned as a WeatherByDateForeCast object.


Throughout, the code illustrates a comprehensive approach to time series forecasting in ML.NET, leveraging the Singular Spectrum Analysis (SSA) model mentioned in the contexts for forecasting future temperature values based on past observations. The option to filter data by station name allows for a more tailored forecasting, which could be particularly useful in scenarios where the temperature data varies significantly between stations, or when specific station forecasts are required.







要查看或添加评论,请登录

Leonardo S.的更多文章

社区洞察

其他会员也浏览了