Recapitulation of Chapter-1

Recapitulation of Chapter-1

In Chapter 1, we have introduced R as a powerful statistical programming language widely used in data analysis. The Chapter outlined R's key features, such as its extensive package ecosystem and strong data visualization capabilities, making it a popular choice for analytics professionals. The chapter also highlighted the benefits of R, including its flexibility, open-source nature, and active community support. Finally, it provided a brief guide on how to download and set up R, along with an overview of basic usage.

In ?Chapter 2 we are going to explore ?basics of R Programming in R Studio, Assigning Variables and the different kinds of Variables that can be assigned.

1.??? Basics of R Programming

1.1.? Writing and Executing Code:

Let’s write some simple codes in the R Console:

  • Arithmetic Operations:

In R, arithmetic operations are performed using standard operators such as + for addition, - for subtraction, * for multiplication, and / for division. R also supports exponentiation with ^ and modulus with %%. These operations can be applied to numbers, vectors, and matrices, allowing for versatile data manipulation and calculation.

?

In R, exponentiation can be performed using the ^ operator. This operator raises a number (the base) to the power of an exponent. Here's how it works:

In R, the modulus operation, which gives the remainder of a division, is performed using the %% operator


Once You have written these codes , save the file with .R extensions and by default in R Studio.

  • Assigning Variables:

In R, variables are assigned using the assignment operator <-, . Assigned variables can be of different type. Let’s discuss each type of variable one by one:

1.??? Numeric Variables:

Numeric variables can store integers, floating-point numbers, or any real number. Here's how you can work with numeric variables in R:

Integers

Floating-point numbers

Real numbers


In order to check whether the variable used is numeric, you can use the class(), typeof(), or is.numeric() functions:


In R, it is possible to convert a floating point number into an integer and vice versa. Given below is the way it is done:

To convert a floating-point number to an integer in R, you can use several methods depending on how you want to handle the conversion. Here are the most common approaches:

1. Using as.integer()

The as.integer() function converts a numeric value to an integer by truncating (removing) the decimal part.

> float_num<-3.99

> int_num<-as.integer(float_num)

> print(int_num)

[1] 3

2. Using floor()

The floor() function rounds down to the nearest integer.

> int_num<-as.integer(float_num)

> float_num<-3.99

> int_num<-floor(float_num)

> print(int_num)

[1] 3

Using ceiling()

The ceiling() function rounds up to the nearest integer.

> int_num<-as.integer(float_num)

> float_num<-3.99

> int_num<-ceiling(float_num)

> print(int_num)

[1] 4

Using round()

The round() function rounds to the nearest integer. You can specify the number of decimal places to round to.

> int_num<-as.integer(float_num)

> float_num-3.99

> int_num<-round(float_num)

> print(int_num)

[1] 4

Using trunc()

The trunc() function truncates the decimal part of the number, effectively similar to as.integer().

> int_num<-as.integer(float_num)

> float_num<-3.99

> int_num<-trunc(float_num)

> print(int_num)

[1] 3

You can also convert an integer to a floating-point number in R. In R, this conversion is straightforward because R performs automatic type coercion when needed. However, you can explicitly convert an integer to a floating-point number using the as.numeric() function.

> int_num<-53

> print(typeof(int_num))

[1] "double"

> float_num<-as.numeric(int_num)

> print(typeof(float_num))

[1] "double"

> print(float_num)

[1] 53

as.numeric() Function: This function converts an integer or other types to numeric (which in R is of type double). This allows for floating-point arithmetic and operations that require decimal precision.

?? Automatic Coercion: R automatically promotes integers to floating-point numbers when performing operations that require a floating-point result.

Example of Automatic Coercion

> int_num <- 42

> result <- int_num + 0.5

> print(typeof(result))

[1] "double"

> print(result)

[1] 42.5

1.??? Character Variables:

You can create character variables in R by assigning text to a variable using quotes (" or ').

> name <- "Swati Ramanujam"

> print(name)

[1] "Swati Ramanujam"

In order to check whether the variable used is numeric, you can use the class(), typeof(), or is.character() functions:

> class(name)

[1] "character"

> typeof(name)

[1] "character"

> is.character(name)

[1] TRUE

Character Operations

1. Concatenation

You can combine multiple character strings using the paste() or paste0() functions.

> #using paste()with space separator

> full_name<-paste("Swati","Ramanujam")

> print(full_name)

[1] "Swati Ramanujam"

2. String Length

The nchar() function returns the number of characters in a string.

> length_name <- nchar(full_name)

> print(length_name)

[1] 15

3. Substring Extraction

> # Extracting a substring

> sub_name <- substr(name, 1, 5)

> print(sub_name)

[1] "Swati"

4.Changing Case

You can change the case of characters using tolower() and toupper().

> lower_name <- tolower(name)

> print(lower_name)

[1] "swati ramanujam"

> upper_name <- toupper(name)

> print(upper_name)

[1] "SWATI RAMANUJAM"

Handling Character Vectors

Character vectors can be manipulated similarly to other vector types in R.

> fruits <- c("apple", "banana", "cherry")

> print(fruits)

[1] "apple"? "banana" "cherry"

> more_fruits <- c(fruits, "date", "elderberry")

> print(more_fruits)

[1] "apple"????? "banana"???? "cherry"???? "date"?????? "elderberry"

Converting Other Types to Character

You can convert numeric or other types of variables to character using the as.character() function.

> num <- 432

> char_num <- as.character(num)

> print(typeof(char_num))

[1] "character"

> print(char_num)

[1] "432"

2.??? Logical Variables:

Logical variables in R are used to represent Boolean values: TRUE and FALSE. They are fundamental in controlling the flow of programs through conditional statements, loops, and logical operations.

Creating Logical Variables

You can create logical variables by directly assigning the values TRUE or FALSE.

> is_honest<-TRUE

> is_dishonest<-FALSE

> print(is_honest)

[1] TRUE

> print(is_dishonest)

[1] FALSE

Checking the Type

You can check the type of a variable using the typeof() function.

> print(typeof(is_honest))

[1] "logical"

> print(typeof(is_dishonest))

[1] "logical"

Logical Operations

Logical operations can be performed on logical variables, resulting in TRUE or FALSE.

AND (&)/OR(||)

The AND operation returns TRUE if both operands are TRUE.

The OR operation returns TRUE if at least one operand is TRUE.

> result <- is_honest & is_dishonest

> print(result)

[1] FALSE

> result<-is_honest||is_dishonest

> print(result)

[1] TRUE

Logical Comparisons

Logical comparisons between numeric or character values return logical variables.

1. Equal to (==)

Checks if two values are equal.

> x<-5

> y<-7

> x==y

[1] FALSE

2. Not equal to (!=)

Checks if two values are not equal.

> x<-5

> y<-7

> x!=y

[1] TRUE

3. Greater than (>)

Checks if one value is greater than another.

> x<-5

> y<-7

> y>x

[1] TRUE

4. Less than (<)

Checks if one value is less than another.

> x<-5

> y<-7

> x<y

[1] TRUE

5. Greater than or equal to (>=)

Checks if one value is greater than or equal to another.

> x<-5

> y<-7

> y>=x

[1] TRUE

?

6. Lesser than or equal to (>=)

> x<-5

> y<-7

> x<=y

[1] TRUE

Logical Vectors

Logical variables can also be part of vectors, allowing you to perform element-wise logical operations.

> vec1 <- c(TRUE, FALSE, TRUE)

> vec2 <- c(FALSE, FALSE, TRUE)

> result <- vec1 & vec2

> print(result)

[1] FALSE FALSE? TRUE

Logical Functions

There are several functions in R that operate on logical vectors:

1. any()

Returns TRUE if at least one element in a logical vector is TRUE

> vec <- c(FALSE, FALSE, TRUE)

> result <- any(vec)

> print(result)

[1] TRUE

2. all()

Returns TRUE if all elements in a logical vector are TRUE.

> vec <- c(FALSE, TRUE, TRUE)

> result<-all(vec)

> print(result)

[1] FALSE

vec <- c(TRUE, TRUE, TRUE)

> result<-all(vec)

> print(result)

[1] TRUE

Coercion to Logical Type

You can convert other data types to logical using the as.logical() function. Non-zero numeric values convert to TRUE, zero converts to FALSE, and empty strings or NA can also be coerced.

> num <- 10

> log_val <- as.logical(num)

> print(log_val)

[1] TRUE

> num <- 0

> log_val <- as.logical(num)

> print(log_val)

[1] FALSE

> num <- NA

> log_val<-as.logical((num))

> print(log_val)

[1] NA

Use Cases of Logical Variables

Logical variables are widely used in:

  • Conditional Statements: To control the flow of code (e.g., if, else).
  • Loop Control: For breaking or continuing loops based on conditions.
  • Subsetting Data: To filter or select data based on logical conditions.

Logical variables and operations are fundamental for decision-making in R programs, allowing you to control the flow of execution based on conditions.

3.??? Factor Variables:

Factor variables in R are used to represent categorical data, which can be either ordered or unordered. Factors are essential for handling categorical variables in statistical modeling and data analysis because they help R understand the data's categorical nature.

Creating Factor Variables

You can create a factor in R using the factor() function.

> fruits<-c("apple", "banana", "orange", "apple", "mango" )

> print(fruits)

[1] "apple"? "banana" "orange" "apple"? "mango"

> fruits <- factor(c("apple", "banana", "orange", "apple", "mango"))

> levels(fruits)

[1] "apple"? "banana" "mango"? "orange"

In this example:

  • The vector fruits is converted into a factor with four levels: banana, mango, and orange.
  • R automatically determines the levels based on the unique values in the vector.

Checking the Type

·??????? You can verify that a variable is a factor using the typeof() and class() functions.

> print(typeof(fruits))

[1] "integer"

> print(class(fruits))

[1] "factor"

Factors are stored as integers internally, with each unique category (level) mapped to a corresponding integer. The class() function returns "factor" because the variable is recognized as a factor.

Specifying Levels and Order

You can manually specify the levels of a factor, as well as their order, which is particularly important for ordinal data.

> sizes <- factor(c("small", "large", "medium", "large", "small"),

+ levels = c("small", "medium", "large"))

> print(sizes)

[1] small? large? medium large? small

Levels: small medium large

Converting Factors to Numeric or Character

Sometimes, you may need to convert factors back to their original numeric or character form.

1. Converting to Character

You can convert a factor to a character using as.character().

> fruits_char <- as.character(fruits)

> print(fruits_char)

[1] "apple"? "banana" "orange" "apple"? "mango"

2. Converting to Numeric

To convert a factor to its underlying integer representation, you can use as.numeric().

> num_factor <- factor(c(10, 20, 10, 30))

> num_values <- as.numeric(as.character(num_factor))

> print(num_values)

[1] 10 20 10 30

Manipulating Factor Levels

You can manipulate the levels of a factor in various ways:

1. Renaming Levels

You can rename the levels of a factor using the levels() function.

> levels(fruits)<-c("apple", "banana", "mango", "orange")

> print(fruits)

[1] apple? banana orange apple? mango

Levels: apple banana mango orange

2. Dropping Unused Levels

After subsetting a factor, you may end up with unused levels. You can drop these using the droplevels() function.

> fruits <- factor(c("apple", "banana", "orange", "apple", "mango"))

> subset_fruits <- fruits[1:3]

> print(subset_fruits)

[1] apple? banana orange

Levels: apple banana mango orange

Factors in Data Frames

Factors are often used in data frames to represent categorical variables, especially when working with datasets imported from CSV files.

> df <- data.frame(ID = 1:5, Fruits = fruits)

> print(df)

? ID Fruits

1? 1? apple

2? 2 banana

3? 3 orange

4? 4? apple

5? 5? mango

We will discuss this again when we talk about the data farmes in R.

Use Cases of Factors

Factors are commonly used in:

  • Statistical Modeling: Categorical predictors in regression, ANOVA, and other models.
  • Data Summarization: Grouping data by categories and summarizing within each group.
  • Plotting: Grouping data in plots (e.g., boxplots by category).

Important Considerations

  • Automatic Conversion: When reading data, R may automatically convert character columns to factors. To prevent this, you can use stringsAsFactors = FALSE.
  • Levels Ordering: For ordinal data, ensure the levels are correctly ordered; otherwise, R will treat them as nominal data.

Factor variables in R are powerful tools for handling categorical data, especially in statistical analysis and data visualization. Properly understanding and managing factors is crucial for accurate data analysis.

4.??? Complex Variables:

Complex variables in R are used to represent complex numbers, which consist of both a real and an imaginary part. Complex numbers are particularly useful in fields like engineering, physics, and certain areas of mathematics.

Creating Complex Variables

To create a complex number in R, you use the form a + bi, where a is the real part, and b is the imaginary part (indicated by i).

Example of Creating a Complex Variable:

z <- 3 + 4i # A complex number where the real part is 3 and the imaginary part is 4

> z <- 3 + 4i

> print(z)

[1] 3+4i

Checking the Type of a Complex Variable

You can check the type of a complex variable using the typeof() & class()functions:

> typeof(z)

[1] "complex"

> class(z)

[1] "complex"

Operations on Complex Variables

R supports various operations on complex numbers, including addition, subtraction, multiplication, division, and more.

Addition and Subtraction:

> z1<-3+4i

> z2<-1-2i

> sum <- z1 + z2

> print(sum)

[1] 4+2i

> diff<-z1-z2

> print(diff)

[1] 2+6i

Multiplication and Division:

> prod <- z1 * z2

z1= 3+4i

z2=1-2i

z1* z2= (3+4i)(1-2i)

=(3-6i)+(4i-8i^2)

=3-2i+8

=11-2i

z1/z2

=(3-4i)/(1-2i)

  1. Multiply the numerator and the denominator by the conjugate of the denominator:

The conjugate of 1?2i1 ?is 1+2i1

So, multiply both the numerator and the denominator by 1+2i1 + 2i1+2i:

(3?4i/1-2i) *(1+2i/1+2i)

Expand the numerator:

(3?4i)×(1+2i)

Distribute:

=3×1+3×2i?4i×1?4i×2i

So:

=3+6i?4i+8 i^2= 11+2i

Expand the denominator:

(1?2i)×(1+2i)

Distribute:

=1×1+1×2i?2i×1?2i×2i

?

=1+2i-2i-4i^2?

=1+4

=5

Combine the results:

11+2i/5

=11/5+2i/5

=2.2+0.4i

Summary

So, the result of dividing (3?4i) by (1?2i)is:

2.2+0.4i

Complex Vectors

Just like other types, you can create vectors of complex numbers.

> complex_vector <- c(1+2i, 3+4i, 5+6i)

> print(complex_vector)

[1] 1+2i 3+4i 5+6i

Summary

Complex variables in R are powerful tools for handling mathematical operations that involve both real and imaginary numbers. R provides a variety of functions to manipulate and analyze complex numbers, making it a versatile environment for working with complex data.

1.??? Integer Variables:

Creating Integer Variables

To create an integer variable in R, you use the L suffix after the number. This explicitly tells R that the number should be treated as an integer.

> x <- 42L

> y <- -7L

> x+y

[1] 35

Checking the Type of a Variable

To check if a variable is an integer, use the is.integer() function:

> is.integer(x) ?# Returns TRUE if x is an integer

[1] TRUE

> is.integer(42) # Returns FALSE because 42 without L is a numeric type

[1] FALSE

Converting Between Numeric and Integer

  • From Numeric to Integer:

You can convert a numeric variable to an integer using the as.integer() function:

> num <- 42.7

> int <- as.integer(num)

> print(int)

[1] 42

Arithmetic Operations with Integers

integers in R can be used in arithmetic operations just like numeric variables:

> a <- 10L

> b <- 3L

> sum <- a + b? # Addition

> print(sum)

[1] 13

> diff <- a – b # Subtraction

> print(diff)

[1] 7

> prod <- a * b ?# Multiplication

> print(prod)

[1] 30

> quot <- a / b? # Division

> a <- 10L

> b <- 3L

> sum <- a + b? # Addition

> print(sum)

[1] 13

?

Note that while the result of division between integers is numeric, it will give a floating-point result. To get an integer result, you would need to use integer division:

> quot_int <- a %/% b

> print(quot_int)?

[1] 3

> mod <- a %% b??

> print(mod)??

[1] 1

Summary

  • Use L to define integer variables.
  • Use is.integer() to check if a variable is an integer.
  • Convert between numeric and integer types with as.numeric() and as.integer().
  • Perform arithmetic operations and conversions as needed.

in R, integers are a subset of numeric variables. Here's a breakdown of why integer variables are treated as a separate type despite being part of the broader numeric category:

Numeric vs. Integer Types

  1. Numeric Type: In R, the default numeric type is double (also called double precision floating-point), which can handle decimal values and very large or very small numbers with high precision. Numeric variables can store both whole numbers and fractional numbers. Example: x <- 5.5 (a numeric variable with a fractional part).
  2. Integer Type: Integer is a more specific type of numeric variable that stores whole numbers without a decimal component. It is represented with the L suffix to explicitly denote that the number is an integer. Example: y <- 7L (an integer variable).

Reasons for Separate Integer Type

  1. Precision and Memory Efficiency: Integers use less memory compared to doubles. When a variable is explicitly declared as an integer, R allocates memory more efficiently for it. For large datasets, this can save significant amounts of memory and improve performance.
  2. Explicit Type Declaration: By having separate integer and numeric types, R allows for explicit type declaration. This helps prevent unintended type conversions and ensures that operations involving integers behave as expected.
  3. Mathematical Operations: Certain operations and functions behave differently with integers compared to numeric (double) values. For example, integer division (%/%) and modulus (%%) operations are specific to integers. Conversions between these types can be important for ensuring that operations produce the intended results.
  4. Data Handling and Analysis: In data analysis, you might want to distinguish between categorical data (often represented as integers) and continuous numerical data (which might include fractional values). Explicit integer handling can be useful in such contexts.

Summary

While integers are indeed a subset of numeric types, they are treated as a distinct type in R to provide clarity and control over how data is stored and manipulated. This distinction helps with memory management, type-specific operations, and precise data handling.

In this chapter, we covered 6 of the 11 different variable types available in R. These variable types provide R with the flexibility to manage a wide range of data and operations, making it a powerful tool for statistical computing and data analysis. The remaining variable types will be explored in the next chapter.

Saikiran Andey

Data Science | ML Engineer & Business Strategist | MS Business Analytics @ Northeastern | RWE Analyst | Pharma Analytics

7 个月

Insightful and Thank you for posting!

回复

要查看或添加评论,请登录

Bhaswati Ramanujam的更多文章

社区洞察

其他会员也浏览了