登录查看更多内容

PoS Regex Pattern matching for ML based Tagging

Rajeev Gangal

发布日期: 2020年8月5日

using CSV

using DataFrames

global myfile =string(ARGS[1]) Input PoS files for feature calculation e.g. #filename= "/home/rajeev/myProjects/nounlist.txt"

mydf = CSV.read(myfile,header = false)

# print header for output file

println("Word,Length,Vowel Whole,Vowel Last2,Consonant Whole,Consonant Last2,Diagraph Whole, Diagraph Last2, Noun pattern, Verb pattern, Adverb pattern, Adjective pattern,Class")

for myrow in eachrow(mydf)

#count no of myVowels

mysentence= string(myrow[end])

if length(mysentence) > 2 && isascii(mysentence)

wordending= mysentence[end-1:end]

#Regexes start for various patterns linked to character patterns

myVowelsRegex = r"a|e|i|o|u"

mydigraphsRegex = r"ch|ci|ck|gh|ng|ph|qu|rh|sc|sh|th|ti|wh|wr|zh"

nounRegexSuffixes = r"ion|sion|tion|acy|ance|ence|hood|ar|or|ism|ist|ment|ness|u|ity"

verbRegexSuffixes = r"ify|ate|ize|en"

adjectiveRegexSuffixes = r"al|ful|ly|ic|ish|like|our|y|ate|able|ible"

adverbRegexSuffixes = r"ly"

global nounlast = length(collect(eachmatch(nounRegexSuffixes, mysentence[end-2:end])))

global verblast = length(collect(eachmatch(verbRegexSuffixes, mysentence[end-2:end])))

global adjectivelast = length(collect(eachmatch(adjectiveRegexSuffixes, mysentence[end-2:end])))

global adverblast = length(collect(eachmatch(adverbRegexSuffixes, mysentence[end-2:end])))

global vowelsWhole= length(collect(eachmatch(myVowelsRegex, mysentence)))

global vowelsLast2= length(collect(eachmatch(myVowelsRegex, wordending)))

global consonantsWhole= length(mysentence)-vowelsWhole

global consonantsLast2= 2-vowelsLast2

global digraphWhole = length(collect(eachmatch(mydigraphsRegex, mysentence)))

global digraphLast2 = length(collect(eachmatch(mydigraphsRegex, wordending)))

#println(mysentence,",",adjectivelast,",",mysentence[end-2:end])

println(mysentence,",",length(mysentence),",",vowelsWhole,",",vowelsLast2,",",consonantsWhole,",",consonantsLast2,",",digraphWhole,",",digraphLast2,",",nounlast,",",verblast,",",adverblast,",",adjectivelast,",","$myfile")

end

要查看或添加评论，请登录

Rajeev Gangal的更多文章

AMA: About Me Anywho/Ask me Anything

2021年11月30日

AMA: About Me Anywho/Ask me Anything

Ask me Anything: A non-celebrity answers oft-asked questions . While my previous articles and posts have focused on…

7 条评论
AIFeynman: Attempt 1

2021年5月30日

AIFeynman: Attempt 1

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Mon May 24 16:53:40 2021 @author: rajeevgangal @VOIS…
AIFeynman: Attempt 2 partial success

2021年5月30日

AIFeynman: Attempt 2 partial success

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sat May 29 16:21:00 2021 @author: rajeevgangal """ """…

3 条评论
Churn dataset Image generation code

2021年1月10日

Churn dataset Image generation code

# -*- coding: utf-8 -*- """ Created on Wed Jan 6 23:09:40 2021 @author: Rajee """ # -*- coding: utf-8 -*- """ Created…

5 条评论
keras-tensorflow code for Telecom Customer churn modelling

2020年12月29日

keras-tensorflow code for Telecom Customer churn modelling

# -*- coding: utf-8 -*- """ Created on Sat Nov 28 23:43:13 2020 @author: Rajeev """ import os…
PoS Word Scoring using Corpus Occurrence frequencies

2020年8月5日

PoS Word Scoring using Corpus Occurrence frequencies

using DataFrames using CSV englishfreqdf= CSV.read("/home/rajeev/myProjects/engletterfreq.
Python code for Messaging using Paho, MQTT

2020年6月28日

Python code for Messaging using Paho, MQTT

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sat Jun 27 17:09:00 2020 @author: rajeevgangal """ import…
Homeopathy, Common Sense, Rationality & Belief!

2020年5月18日

Homeopathy, Common Sense, Rationality & Belief!

I have relatives, good friends and acquaintances who believe in homeopathy. Just like I have ones who believe in…

4 条评论
Simple ML workflow to identify antiviral molecules

2020年3月23日

Simple ML workflow to identify antiviral molecules

All Blockbusters have a Prequel..

11 条评论
Analytics in Sport. (I hereby anoint the field---Passionalytics!)

2019年11月3日

Analytics in Sport. (I hereby anoint the field---Passionalytics!)

One of the perks of being an analytics professional, is to be able to splurge on interesting gadgets under the guise of…

1 条评论

See all articles

PoS Regex Pattern matching for ML based Tagging

Rajeev Gangal

Rajeev Gangal的更多文章

社区洞察

其他会员也浏览了

Window functions in DAX

Data Wrangling of Electoral Data

Local enumerations in TwinCAT 3

How to justify assumptions behind a Fit Least Squares model in JMP?

R Data Types – Vectors, Matrices, Lists, and Data Frames

Number.Int & Number.ExcelInt in Power Query M Language

Quantico: Plotting

Simplifying Global State in React Applications with Context API: A Practical Guide

C++20: An Infinite Data Stream with Coroutines

Grind 75 - 18 - Reverse Linked List

Rajeev Gangal的更多文章

AMA: About Me Anywho/Ask me Anything

AIFeynman: Attempt 1

AIFeynman: Attempt 2 partial success

Churn dataset Image generation code

keras-tensorflow code for Telecom Customer churn modelling

PoS Word Scoring using Corpus Occurrence frequencies

Python code for Messaging using Paho, MQTT

Homeopathy, Common Sense, Rationality & Belief!

Simple ML workflow to identify antiviral molecules

Analytics in Sport. (I hereby anoint the field---Passionalytics!)

社区洞察

其他会员也浏览了

Window functions in DAX

Data Wrangling of Electoral Data

Local enumerations in TwinCAT 3

How to justify assumptions behind a Fit Least Squares model in JMP?

R Data Types – Vectors, Matrices, Lists, and Data Frames

Number.Int & Number.ExcelInt in Power Query M Language

Quantico: Plotting

Simplifying Global State in React Applications with Context API: A Practical Guide

C++20: An Infinite Data Stream with Coroutines

Grind 75 - 18 - Reverse Linked List