So Many Words.

So Many Words.

Let's do some more with words. R allows us to analyze the intent of the writer. Raspberry Pi allows us to simulate the reader.

R Programming Language

Natural Language Processing refers to the words in a document as "terms." Nothing mysterious about that. When you analyze a document, you want to count the terms. R allows you to create a matrix with terms on one axis and documents on the other. For example...

# Name of the columns are terms
head(RT_DTmatrix$dimnames$Terms)
[1] "aback"         "abandon"       "abanindra"     "abanindranath"
[5] "abash"         "abat"        
 
# the rows are documents
> head(RT_DTmatrix$dimnames$Docs)
[1] "Chitra..a.Play.in.One.Act"                                              
[2] "Creative.Unity"                                                         
[3] "Fruit.Gathering"                                                        
[4] "Gitanjali"                                                              
[5] "Glimpses.of.Bengal.Selected.from.the.Letters.of.Sir.Rabindranath.Tagore"
[6] "Mashi..and.Other.Stories" 

In the full document, there are 14,026 terms and 19 documents. Here's how to build a document/term matrix from the works of Rabindranath Tagore.

# NLP: create document vs terms. 

# Load the tm package ---------------------------------
# install.packages("tm")
library(tm)

# start with the corpus developed in previous session
# RT_corpus contains the work of Rabindranath Tagore as found at project gutenberg
load(file = "Rabindranath_corpus.rdata")

# Term Document Matrix ----------------------------------------------------
# Term Document is terms on rows.
# Document Term is documents on rows
RT_TDmatrix <- TermDocumentMatrix(RT_corpus, control = list(
                           stopwords = TRUE, 
                           removePunctuation = TRUE,
                           removeNumbers = TRUE,
                           stemming = TRUE))

# this tokenizes and places terms across top. Inspect dimnames of RT_TDmatrix
RT_DTmatrix <- DocumentTermMatrix(RT_corpus, control = list(
                           stopwords = TRUE, 
                           removePunctuation = TRUE,
                           removeNumbers = TRUE,
                           stemming = TRUE))
inspect(RT_DTmatrix)

# this is a matrix - albeit object type = DocumentTermMatrx. 
# with "inspect()" You can subset just like any other matrix
inspect(RT_DTmatrix[ 3 , "heart"]) # row 3, column "heart"
inspect(RT_DTmatrix[  , "heart"]) # all rows, column "heart"
inspect(RT_DTmatrix[  , c("heart", "eye", "sad")]) # all rows, multiple columns

Confused? - here's a video to explain document/term or term/document matrices...

With this matrix of words by documents, you can do some really interesting stuff. More to come next week.

Raspberry Pi

Speech is words come to life. Or in the case of a computer, the simulation of words come to life. Here's a segment from Raspberry Pi weekly where we hook up a speech synthesizer.

It's obviously a machine speaking - but if it reads poetry, is it inhuman? Before you debate the theory, here's some code to implement an example...

# feed a line of text to Emic

import serial
import random
import time

ser = serial.Serial(
    port='/dev/ttyS0',
    baudrate = 9600,
    parity=serial.PARITY_NONE,
    stopbits=serial.STOPBITS_ONE,
    bytesize=serial.EIGHTBITS
    )

   
def speakAString( theCommand, sayThis):
    time.sleep(.25) # pause to give Emics a secod to reset
    
    sayThisFormatted = theCommand + str(sayThis) + "\r" # Emics requires lines to end in return
    # print("sending",sayThisFormatted)
    ser.write(sayThisFormatted.encode())
    
sayThis = "nothing yet"


while (sayThis != "stop"):
    # choose a random voice
    theVoice = random.randint(0,8)
    speakAString("N",theVoice)
    print("Speaking with voice #", theVoice)
    
    # choose a speed
    theSpeed = random.randint(75,250) #set speaking rate. max = 600
    speakAString("W",theSpeed)
    print("Speaking Rate: ",theSpeed)
    
    # choose a volume
    theVolume = random.randint(-20,18) # Minimum is -48, but you can't hear it
    speakAString("V", theVolume)
    print("Volume: ", theVolume)
              
    sayThis = input("Listening to you: ")
    print("Speaking: ", sayThis)
    speakAString("S", sayThis)
    
    print("\n\n")

ser.close()


I'm on a mission to explore alternatives to user interface. Are monitors / keyboards our only option for communicating with a computer? I hope not - and here's an alternative.

What is this newsletter?

Every week I discuss the R language, Raspberry Pi or other R topics. Please share and subscribe!

#rstats #raspberrypi #r

要查看或添加评论,请登录

Mark Niemann-Ross的更多文章

  • Documenting My Code ... For Me

    Documenting My Code ... For Me

    There are two signs of old age: old age, and ..

  • R Meets Hardware

    R Meets Hardware

    R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data…

    2 条评论
  • Party Buzz Kill: modifying data

    Party Buzz Kill: modifying data

    So Steve (SQL), Marsha (C), Bob (Python), and I (R) are at this party. We have TOTALLY cleared the room, especially now…

    2 条评论
  • Rain - Evapotranspiration = mm Water

    Rain - Evapotranspiration = mm Water

    "Eeee-VAP-oooo-TRANS-PURR-ation," I savor the word as I release it into our conversation. I'm still at the party with…

  • Party Buzz Kill: Data Storage

    Party Buzz Kill: Data Storage

    I'm at this party where Bob and Marsha and I are discussing the best languages for programming a Raspberry Pi. Bob…

    5 条评论
  • R Waters My Garden

    R Waters My Garden

    I'm at a party, and the topic of programming languages comes up. A quarter of the room politely leaves, another half…

    10 条评论
  • Caning and Naming

    Caning and Naming

    We've been back from Port Townsend for a week. Progress on the boat isn't as dramatic as it is when we're spending the…

    1 条评论
  • Irrigate with R and Raspberry Pi

    Irrigate with R and Raspberry Pi

    I’m working on my irrigation system. This requires a controller to turn it on and off.

    3 条评论
  • 5 Reasons to Learn Natural Language Processing with R

    5 Reasons to Learn Natural Language Processing with R

    Why learn R? Why learn Natural Language Processing? Here's five reasons..

    1 条评论
  • Performing Natural Language Processing with R

    Performing Natural Language Processing with R

    I recently released a course on Educative covering topics in Natural Language Processing. Different Learners -…

    1 条评论

社区洞察

其他会员也浏览了