Evaluating Python #1

Evaluating Python #1

Determining Password Strength Using Natural Language Processing

In this article, I would like to discuss about my first portfolio project which is Determining Password Strength Using NLP. I will elaborate the project in a SAR format.

S – Situation

T - Task

A – Action

R – Result

About the dataset: -

The dataset contains around 600,000+ various passwords of various strength. The strength of the secret word is indicated by a number.

Situation: -

Secret key is first layer of confirmation for any got gadget, account, and so on. A client should make areas of strength for a so their information is secure from unapproved clients. Remembering this, I will attempt to foresee regardless of whether the secret word strength is worth the effort.

Task: -

  • As regular while utilizing Python programming for information examination we import our information utilizing Pandas library.
  • To amend the one of a kind qualities in our solidarity section of the information we will utilize Python command, unique().
  • To clean the information we really want to search for the invalid/nan values. The presence of invalid qualities diminishes the exactness score of the model.
  • After cleaning the information at its best we will presently rearrange the information. The rearranging decides the models strength.
  • The key piece of the task is the point at which it comes in parting. We will require this to part the rundown of characters.
  • After effectively making split capability, we will import tf vectorizer library. The objective to utilize tf vectorizer is proportional down the effect of tokens that happen much of the time in a given corpus and that are thus exactly less enlightening than highlights that happen in a little part of the preparation corpus.Term Frequency (tf) - It tells us, how frequently a term occurs in a document.?

TF(t)= (???????????? ???? ?????????? ???????? ?? ?????????????? ???? ?? ????????????????) / (?????????? ???????????? ???? ?????????? ???? ?? ????????????????)

  • Inverse Data Frequency (idf) – It tells us the weight of rare words. The words that occur rarely in the corpus have a high IDF score.?

IDF(t)=log (?????????? ???????????? ???? ??????????????????)/ (???????????? ???? ?????????????????? ??????? ???????? ?? ???? ????)

  • TF-IDF example: Let us take two sentences sentence 1– earth is the third planet from the sun sentence 2– Jupiter is the largest planet We will now calculate the TF-IDF scores:

No alt text provided for this image

  • From the above table, we can say that TF-IDF is zero for familiar words, which shows that they are not critical. In any case, the TF-IDF is non-zero for significant words, for example, 'earth', 'third', 'sun', 'biggest', 'Jupiter'.
  • After applying tf-vectorizer we will presently divide the information for our model.

Action: -

  • Training set - A subset to prepare my model. Testing set - A subset to test my model
  • We will divide the information into 80-20 proportion, yet you can part it in various proportions too.
  • Using Logistic Regression as we are doing characterization of passwords. We are setting multiclass boundary an 'multinomial' since we have multiple classifications in the information, for example 0, 1 and 2. For that reason we are thinking about an instance of multinomial Logistic Regression.

Result: -

  • Here, the secret key '%@123abcd' that I have entered is showing the secret key strength= 1, i.e., it is of feeble strength.
  • Checking exactness by utilizing disarray grid and it emerged to be 82.06%.

Thanks for reading.....

To get to know more about this project: https://debjyotisaha1998.wixsite.com/myportfolio/password-strength

To know more about me: https://debjyotisaha1998.wixsite.com/myportfolio

要查看或添加评论,请登录

社区洞察

其他会员也浏览了