<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blogs | Rajesh Majumder blog</title><link>https://rajeshmajumderblog.netlify.app/blog/</link><atom:link href="https://rajeshmajumderblog.netlify.app/blog/index.xml" rel="self" type="application/rss+xml"/><description>Blogs</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sun, 13 Jul 2025 00:00:00 +0000</lastBuildDate><image><url>https://rajeshmajumderblog.netlify.app/media/icon_hua2ec155b4296a9c9791d015323e16eb5_11927_512x512_fill_lanczos_center_2.png</url><title>Blogs</title><link>https://rajeshmajumderblog.netlify.app/blog/</link></image><item><title>Loki, the Tesseract, and the Secret of the Fourth Dimension</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_xiii/</link><pubDate>Sun, 13 Jul 2025 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_xiii/</guid><description>
&lt;div id="TOC">
&lt;ul>
&lt;li>&lt;a href="#the-trickster-and-the-cube" id="toc-the-trickster-and-the-cube">The Trickster and the Cube&lt;/a>&lt;/li>
&lt;li>&lt;a href="#waitwhat-even-is-a-tesseract" id="toc-waitwhat-even-is-a-tesseract">Wait—What Even Is a Tesseract?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#visualizing-the-impossible" id="toc-visualizing-the-impossible">Visualizing the Impossible&lt;/a>&lt;/li>
&lt;li>&lt;a href="#real-life-analogies-so-your-brain-doesnt-explode" id="toc-real-life-analogies-so-your-brain-doesnt-explode">Real-Life Analogies (So Your Brain Doesn’t Explode)&lt;/a>&lt;/li>
&lt;li>&lt;a href="#physics-agrees" id="toc-physics-agrees">Physics Agrees&lt;/a>&lt;/li>
&lt;li>&lt;a href="#why-loki-and-the-tesseract-make-sense" id="toc-why-loki-and-the-tesseract-make-sense">Why Loki and the Tesseract Make Sense&lt;/a>&lt;/li>
&lt;li>&lt;a href="#final-thought-mischief-math-and-meaning" id="toc-final-thought-mischief-math-and-meaning">Final Thought: Mischief, Math, and Meaning&lt;/a>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;p>&lt;em>“I am Loki of Asgard, and I am burdened with glorious purpose.”&lt;/em>&lt;/p>
&lt;p>That one line was enough to send chills down our spines—and New York into chaos.&lt;/p>
&lt;p>Loki. The God of Mischief. Prince of Asgard. Adopted son. Outcast. Trickster.&lt;/p>
&lt;p>He’s walked through fire, betrayal, redemption, timelines, and TVA offices. But there’s one thing that always seems to follow him like a shadow: &lt;strong>The Tesseract.&lt;/strong>&lt;/p>
&lt;p>That mysterious glowing blue cube… object of obsession, war, and wonder.&lt;/p>
&lt;p>But here’s the thing: the Tesseract wasn’t just a Marvel MacGuffin or a sci-fi light show. It represented something &lt;strong>far deeper&lt;/strong>—something most people miss, just like I did until now 😂.&lt;/p>
&lt;p>It wasn’t just about space. Or time. It was about &lt;strong>breaking the limits&lt;/strong> of what we understand. It was about the &lt;strong>fourth dimension&lt;/strong>.&lt;/p>
&lt;div id="the-trickster-and-the-cube" class="section level2">
&lt;h2>The Trickster and the Cube&lt;/h2>
&lt;p>Let’s rewind. In The &lt;em>Avengers (2012)&lt;/em>, we see Loki land on Earth like a thunderbolt—armed with charm, chaos, and the Tesseract. He doesn’t just want to rule—he wants to bend reality. Slide between realms. Open doors no one else can.&lt;/p>
&lt;p>With the Tesseract in hand, he teleports across cities, escapes imprisonment, and whispers across dimensions. To most, it’s a weapon. To Loki? &lt;strong>It’s a key&lt;/strong>.&lt;/p>
&lt;p>Because what he’s always wanted isn’t power for power’s sake. It’s freedom. Freedom from being Thor’s shadow. From being Odin’s mistake. From being bound by one timeline, one fate, one reality.&lt;/p>
&lt;p>The Tesseract gave him that taste. Because the Tesseract is more than it seems.&lt;/p>
&lt;/div>
&lt;div id="waitwhat-even-is-a-tesseract" class="section level2">
&lt;h2>Wait—What Even Is a Tesseract?&lt;/h2>
&lt;p>The word Tesseract isn’t made up by Marvel. It’s a real concept in geometry. It’s a &lt;strong>4-dimensional cube&lt;/strong>.&lt;/p>
&lt;p>Sounds bonkers, right? Let’s walk through it step-by-step.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>0D: A Point&lt;/strong> – No length, no width—just a dot.&lt;/li>
&lt;li>&lt;strong>1D: A Line&lt;/strong> – Stretch that dot in one direction → a line.&lt;/li>
&lt;li>&lt;strong>2D: A Square&lt;/strong> – Drag the line sideways → now you have length + width.&lt;/li>
&lt;li>&lt;strong>3D: A Cube&lt;/strong> – Move that square up into space → you get depth.&lt;/li>
&lt;li>&lt;strong>4D: A Tesseract&lt;/strong> – Move a cube in a completely new direction → the fourth dimension.&lt;/li>
&lt;/ul>
&lt;p>We can’t see that direction. We weren’t built to. But math says it’s there. Just like a flat cartoon can’t see “up,” but we can.&lt;/p>
&lt;p>&lt;img src="iii.gif" />&lt;/p>
&lt;/div>
&lt;div id="visualizing-the-impossible" class="section level2">
&lt;h2>Visualizing the Impossible&lt;/h2>
&lt;p>We cheat by looking at shadows:&lt;/p>
&lt;ul>
&lt;li>A cube casts a square shadow.&lt;/li>
&lt;li>A tesseract casts a cube-within-a-cube shadow that warps and rotates strangely.&lt;/li>
&lt;li>You’ve probably seen this gifs of a cube folding into itself—that’s a 3D shadow of a 4D shape. That’s the Tesseract.&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="kkk.gif" />&lt;/p>
&lt;p>The tesseract can be unfolded into eight cubes into 3D space, just as the cube can be unfolded into six squares into 2D space.&lt;/p>
&lt;p>&lt;img src="jjj.gif" />&lt;/p>
&lt;/div>
&lt;div id="real-life-analogies-so-your-brain-doesnt-explode" class="section level2">
&lt;h2>Real-Life Analogies (So Your Brain Doesn’t Explode)&lt;/h2>
&lt;ul>
&lt;li>&lt;p>&lt;strong>The Shadow World:&lt;/strong> Imagine a 2D world on paper. A 3D ball drops through it. The 2D beings see a circle that grows and shrinks—it’s magic to them. To us, it’s just physics.
Now flip it. A 4D object entering our space would look like a cube suddenly appearing, shifting, and vanishing.
&lt;em>Sound familiar?&lt;/em>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Storing in the 4th Dimension:&lt;/strong> Imagine your apartment’s too full. What if you could store your couch in a direction outside the 3 we know? That’s the kind of freedom a tesseract implies.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Reality Like a Flipbook:&lt;/strong> Our 3D world is one page. Flip the book, and you get a new world each time. The fourth dimension lets you flip pages—jump timelines—just like Loki did.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="physics-agrees" class="section level2">
&lt;h2>Physics Agrees&lt;/h2>
&lt;p>This isn’t just sci-fi:&lt;/p>
&lt;ul>
&lt;li>Einstein called time the fourth dimension, making spacetime.&lt;/li>
&lt;li>String theory predicts up to 10 or 11 dimensions.&lt;/li>
&lt;li>Math proves the tesseract is real—even if we can’t see it.&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="why-loki-and-the-tesseract-make-sense" class="section level2">
&lt;h2>Why Loki and the Tesseract Make Sense&lt;/h2>
&lt;p>Loki doesn’t want to rule Earth. He wants to escape being labeled: &lt;em>“adopted”&lt;/em>, &lt;em>“villain”&lt;/em>, &lt;em>“failure”&lt;/em>. He wants to rewrite himself. Be reborn. Free from fate.&lt;/p>
&lt;p>The Tesseract is his way out. His escape hatch from the script.&lt;/p>
&lt;/div>
&lt;div id="final-thought-mischief-math-and-meaning" class="section level2">
&lt;h2>Final Thought: Mischief, Math, and Meaning&lt;/h2>
&lt;p>Loki is all of us—bending rules, breaking molds, reaching for something bigger.&lt;/p>
&lt;p>&lt;img src="lll.gif" />&lt;/p>
&lt;p>The Tesseract is more than a cube. It’s a symbol of possibility. Of transcendence. Of the fourth dimension that lives not just in physics…&lt;/p>
&lt;p>…but maybe, also in hope.&lt;/p>
&lt;p>If your brain’s spinning like a tesseract-in-a-blender, good. That means you’ve seen a glimpse of something more—just like Loki did.&lt;/p>
&lt;p>Ohh!! lastly!! Thanks to Wikipedia for providing such excellent visual animations.&lt;/p>
&lt;/div></description></item><item><title>Clastering in R</title><link>https://rajeshmajumderblog.netlify.app/blog/external-project_ii/</link><pubDate>Tue, 12 Dec 2023 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/external-project_ii/</guid><description/></item><item><title>Comprehensive Summary of Some Most Applicable Machine Learning Techniques</title><link>https://rajeshmajumderblog.netlify.app/blog/external-project_iv/</link><pubDate>Tue, 12 Dec 2023 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/external-project_iv/</guid><description/></item><item><title>Concept of ANOVA and Its Sample Size Calculation Formula</title><link>https://rajeshmajumderblog.netlify.app/blog/external-project_iii/</link><pubDate>Tue, 12 Dec 2023 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/external-project_iii/</guid><description/></item><item><title>What is Small Area Estimation? Let's understand!</title><link>https://rajeshmajumderblog.netlify.app/blog/external-project_v/</link><pubDate>Tue, 12 Dec 2023 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/external-project_v/</guid><description/></item><item><title>How to convert Shiny app to a stand alone desktop application</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_xii/</link><pubDate>Mon, 28 Aug 2023 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_xii/</guid><description>
&lt;p>Last month I got a consultancy work where I had to examine data on male breast cancer in Indian men and create an application for my clients—who were essentially doctors—so that they could utilize it for their upcoming diagnoses. For performing fundamental statistical analysis, I typically use R programming, and I typically use Python to create apps. But this time, I faced difficulties because I needed to deploy the app quickly and all of the statistical tools that were utilized here weren’t available in Python (or, to be more honest, I didn’t know how to do that in Python).&lt;/p>
&lt;p>The only choice left was to use Rshiny, but this time the issue was that they desired a desktop application. It was quite challenging for me at the time because I was only aware of how to utilize R Shiny to construct a web application that could be hosted on Shiny Server. I then began looking for answers, coming across a variety of them. In some articles, it is recommended to utilize Electron to transform a Siny app into a desktop application. Other sources advise utilizing the RInno package to do the same. But I found them all to be incredibly challenging. All of these approaches, which I attempted, were unsuccessful for me.&lt;/p>
&lt;p>I later discovered &lt;a href="http://blog.analytixware.com/2014/03/packaging-your-shiny-app-as-windows.html">“Packaging your Shiny App as a Windows desktop app”&lt;/a> on Analytixware’s site, which is a wonderfully simple and efficient answer to my issue. Later, I came across &lt;a href="https://www.r-bloggers.com/2014/04/deploying-desktop-apps-with-r/">Lee Pang’s&lt;/a> article on R Bloggers where he offered a similar solution. Here I’m going to explain these steps to convert a Shiny app into a standalone desktop app. Note that, this steps are Windows specific, not for Mac.&lt;/p>
&lt;div id="steps-to-convert-shiny-app-into-a-desktop-app" class="section level2">
&lt;h2>Steps to convert Shiny App into a Desktop app:&lt;/h2>
&lt;div id="step-1" class="section level3">
&lt;h3>Step 1&lt;/h3>
&lt;p>Create a Folder in a specific location add give it a name which you have decided for your App.&lt;/p>
&lt;p>For example :&lt;/p>
&lt;ul>
&lt;li>&lt;strong>path:&lt;/strong> &lt;code>D:\Myapps\&lt;/code>&lt;/li>
&lt;li>&lt;strong>Create New folder:&lt;/strong> &lt;code>D:\Myapps\MyApp1\&lt;/code>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="step-2" class="section level3">
&lt;h3>Step 2&lt;/h3>
&lt;p>Download:&lt;/p>
&lt;ul>
&lt;li>R-Portable&lt;/li>
&lt;li>Google Chrome Portable&lt;/li>
&lt;/ul>
&lt;p>and install both into &lt;code>MyApp1\&lt;/code> folder.&lt;/p>
&lt;p>So inside &lt;code>MyApp1&lt;/code> folder there will be another two folder:&lt;/p>
&lt;p>&lt;code>D:\Myapps\MyApp1\GoogleChromePortable\&lt;/code>
&lt;code>D:\Myapps\MyApp1\R-Portable\&lt;/code>&lt;/p>
&lt;/div>
&lt;div id="step-3" class="section level3">
&lt;h3>Step 3&lt;/h3>
&lt;p>Download all dependencies(R packages) of your shiny app into R portable.&lt;/p>
&lt;/div>
&lt;div id="step-4" class="section level3">
&lt;h3>Step 4&lt;/h3>
&lt;p>Create a folder called &lt;code>D:\Myapps\MyApp1\shiny\&lt;/code>. This is where the files for your Shiny app (e.g. ui.R and server.R, data.csv, etc) will reside.&lt;/p>
&lt;/div>
&lt;div id="step-5" class="section level3">
&lt;h3>Step 5&lt;/h3>
&lt;p>Add the following to the &lt;code>server.R&lt;/code> inside the &lt;code>shinyServer(function(input, output, session) { ... })&lt;/code>. It is important to pass &lt;strong>session&lt;/strong> as the third argument! The code you need to add is:&lt;/p>
&lt;pre class="r">&lt;code>shinyServer(function(input, output, session) { ... }) {
# ... your other server code here
# close the R session when Chrome closes
session$onSessionEnded(function() {
stopApp()
q(&amp;quot;no&amp;quot;)
})
# ... your other server code here
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="step-6" class="section level3">
&lt;h3>Step 6&lt;/h3>
&lt;p>To launch the application you will need two scripts:&lt;/p>
&lt;p>&lt;code>D:\Myapps\MyApp1\runShinyApp.R&lt;/code> : an R-script that loads the shiny package and launches your app via &lt;code>runApp()&lt;/code>
A shell script (either a &lt;code>.bat&lt;/code> or &lt;code>.vbs&lt;/code> file) that invokes R-portable.&lt;/p>
&lt;div id="step-6.1-create-runshinyapp.r" class="section level4">
&lt;h4>Step 6.1 Create runshinyApp.R:&lt;/h4>
&lt;p>Open a new notepad file and paste this following lines of code and save it as &lt;code>runShinyApp.R&lt;/code> on &lt;code>D:\Myapps\MyApp1\&lt;/code> location.&lt;/p>
&lt;pre class="r">&lt;code>.libPaths(&amp;quot;./R-Portable/App/R-Portable/library&amp;quot;)
# the path to portable chrome
browser.path = file.path(getwd(),&amp;quot;GoogleChromePortable/GoogleChromePortable.exe&amp;quot;)
options(browser = browser.path)
shiny::runApp(&amp;quot;./Shiny/&amp;quot;,port=8888,launch.browser=TRUE)&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="step-6.2-create-shell-script-run.vbs-run.bat" class="section level4">
&lt;h4>Step 6.2 Create shell script (run.vbs / run.bat):&lt;/h4>
&lt;p>Again open a new notepad file and paste this following code and save it as &lt;code>run.vbs&lt;/code> or &lt;code>run.bat&lt;/code> on &lt;code>D:\Myapps\MyApp1\&lt;/code> location.(I created &lt;code>run.vbs&lt;/code>)&lt;/p>
&lt;pre>&lt;code>Randomize
CreateObject(&amp;quot;Wscript.Shell&amp;quot;).Run &amp;quot;R-Portable\App\R-Portable\bin\R.exe CMD BATCH --vanilla --slave runShinyApp.R&amp;quot; &amp;amp; &amp;quot; &amp;quot; &amp;amp; RND &amp;amp; &amp;quot; &amp;quot;, 0, False&lt;/code>&lt;/pre>
&lt;p>Now, click on the run.vbs, you will see your App will be opened on the Google Crome browser.&lt;/p>
&lt;p>I highly recommend you to read &lt;a href="http://blog.analytixware.com/2014/03/packaging-your-shiny-app-as-windows.html">Analytixware’s blog&lt;/a> and &lt;a href="https://www.r-bloggers.com/2014/04/deploying-desktop-apps-with-r/">Lee Pang’s&lt;/a> article on R Bloggers for clear understanding.&lt;/p>
&lt;/div>
&lt;/div>
&lt;/div></description></item><item><title>How to Add Table of contents in a R Markdown and Jupyter Notebook Dodument</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_xi/</link><pubDate>Sun, 25 Jun 2023 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_xi/</guid><description>
&lt;div id="TOC">
&lt;ul>
&lt;li>&lt;a href="#r-markdown" id="toc-r-markdown">R Markdown&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#pdf-document" id="toc-pdf-document">PDF Document&lt;/a>&lt;/li>
&lt;li>&lt;a href="#html-document" id="toc-html-document">HTML Document&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#jupyter-notebook" id="toc-jupyter-notebook">Jupyter Notebook&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#step-1-select-the-markdown-format" id="toc-step-1-select-the-markdown-format">Step 1: Select the Markdown Format&lt;/a>&lt;/li>
&lt;li>&lt;a href="#step-2-create-the-structure-of-the-table-of-content" id="toc-step-2-create-the-structure-of-the-table-of-content">Step 2: Create the Structure of the Table of Content&lt;/a>&lt;/li>
&lt;li>&lt;a href="#step-3-create-anchor-tags" id="toc-step-3-create-anchor-tags">Step 3: Create Anchor Tags&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;p>As a statistician or a data analyst, writing reports and showcasing results is the most essential part. When you are dealing with a large data analysis project it is obvious that there will be several sections and sub-sections which you will explain in our report. Not only for the writing report; even if to write a blog or some article, there will be also several sections that you will mention. So, to give your reader a brief idea about what are the things that are mentioned in your blog or article one of the most important feature is to give a Table of Contents.&lt;/p>
&lt;p>Today I’m going to share the steps that how I’m creating my content list for my blog posts and reports which I usually create by using R Markdown &amp;amp; Jupyter Notebook.
pdf_document&lt;/p>
&lt;div id="r-markdown" class="section level1">
&lt;h1>R Markdown&lt;/h1>
&lt;p>For R Markdown adding a content list is very easy. You can add a table of contents (TOC) using the &lt;code>toc&lt;/code> option. For example:&lt;/p>
&lt;div id="pdf-document" class="section level2">
&lt;h2>PDF Document&lt;/h2>
&lt;pre class="r">&lt;code>---
title: &amp;quot;Habits&amp;quot;
output:
pdf_document:
toc: true
---&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="html-document" class="section level2">
&lt;h2>HTML Document&lt;/h2>
&lt;pre class="r">&lt;code>---
title: &amp;quot;Habits&amp;quot;
output:
html_document:
toc: true
---&lt;/code>&lt;/pre>
&lt;p>There are some other options to customize content list:&lt;/p>
&lt;pre class="r">&lt;code>---
title: &amp;quot;Habits&amp;quot;
output:
html_document:
toc: true # table of content true/yes
toc_depth: 3 # upto three depths of headings (specified by #, ## and ###)
number_sections: true # if you want number sections at each table header
theme: united # many options for theme, this one is my favorite.
highlight: tango # specifies the syntax highlighting style
css: my.css # you can add your custom css, should be in same folder
---&lt;/code>&lt;/pre>
&lt;p>For more details see: &lt;a href="https://bookdown.org/yihui/rmarkdown/html-document.html">https://bookdown.org/yihui/rmarkdown/html-document.html&lt;/a>&lt;/p>
&lt;/div>
&lt;/div>
&lt;div id="jupyter-notebook" class="section level1">
&lt;h1>Jupyter Notebook&lt;/h1>
&lt;p>The only tools required to include a table of contents in a Jupyter notebook are the &lt;strong>“anchor tags”&lt;/strong> in the appropriate places. The links that point to the other sections of the notebook’s table of content are translated into such links.&lt;/p>
&lt;p>Creating a table of content in a Jupyter notebook is quite easy and simple. We can add the table of content in the Jupyter notebook using the HTML anchor. See the followin steps:&lt;/p>
&lt;div id="step-1-select-the-markdown-format" class="section level2">
&lt;h2>Step 1: Select the Markdown Format&lt;/h2>
&lt;p>Open the Jupyter notebook and select the markdown cell format instead of the code.&lt;/p>
&lt;p>&lt;img src="Contentpic1.jpeg" />&lt;/p>
&lt;/div>
&lt;div id="step-2-create-the-structure-of-the-table-of-content" class="section level2">
&lt;h2>Step 2: Create the Structure of the Table of Content&lt;/h2>
&lt;p>First, create a table of contents using the markdown in the notebook. Here, we also need to link the anchors that we will create in the next step. Use the following text and paste it into the markdown cell:&lt;/p>
&lt;pre class="r">&lt;code>## Table of Contents
* [Chapter 1](#chapter1)
* [Section 1.1](#section_1_1)
* [Sub Section 1.1.1](#sub_section_1_1_1)
* [Chapter 2](#chapter2)
* [Section 2.1](#section_2_1)
* [Sub Section 2.1.1](#sub_section_2_1_1)
* [Sub Section 2.1.2](#sub_section_2_1_2)
* [Section 2.2](#section_2_2)
* [Sub Section 2.2.1](#sub_section_2_2_1)
* [Sub Section 2.2.2](#sub_section_2_2_2)
* [Chapter 3](#chapter3)
* [Section 3.1](#section_3_1)
* [Sub Section 3.1.1](#sub_section_3_1_1)
* [Sub Section 3.1.2](#sub_section_3_1_2)
* [Section 3.2](#section_3_2)
* [Sub Section 3.2.1](#sub_section_3_2_1)
* [Sub Section 3.2.2](#sub_section_3_2_2)&lt;/code>&lt;/pre>
&lt;p>Press &lt;strong>Shift + Enter&lt;/strong> to run the previous lines in the Jupyter notebook. The table of content should display like this:&lt;/p>
&lt;p>&lt;img src="Contentpic2.jpeg" />&lt;/p>
&lt;p>Note that, the displayed name of the link is enclosed in brackets &lt;code>[]&lt;/code> and the reference to the anchor tags is placed in parenthesis preceded by a hash &lt;code>(#)&lt;/code> symbol.&lt;/p>
&lt;/div>
&lt;div id="step-3-create-anchor-tags" class="section level2">
&lt;h2>Step 3: Create Anchor Tags&lt;/h2>
&lt;p>Now, we will create the anchor tags in order to link with the table of contents. Create the chapters, sections, and subsections. Enter the following text in the next markdown cell:&lt;/p>
&lt;pre class="r">&lt;code>## Chapter 1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;chapter1&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is chapter number 1
### Section 1.1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;section_1_1&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is section 1.1
##### Section 1.1.1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_1_1_1&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 1.1.1
## Chapter 2 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;chapter2&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is chapter number 2
### Section 2.1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;section_2_1&amp;quot;&amp;gt;&amp;lt;/a
This is section 2.1
#### Section 2.1.1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_2_1_1&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 2.1.1
#### Section 2.1.2 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_2_1_2&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 2.1.2
### Section 2.2 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;section_2_2&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is section 2.2
#### Section 2.2.1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_2_2_1&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 2.2.1
#### Section 2.2.2 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_2_2_2&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 2.2.2
## Chapter 3 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;chapter3&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is chapter number 3
### Section 3.1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;section_3_1&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is section 3.1
#### Section 3.1.1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_3_1_1&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 3.1.1
#### Section 3.1.2 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_3_1_2&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 3.1.2
### Section 3.2 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;section_3_2&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is section 3.2
#### Section 3.2.1 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_3_2_1&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 3.2.1
#### Section 3.2.2 &amp;lt;a class=&amp;quot;anchor&amp;quot; id=&amp;quot;sub_section_3_2_2&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
This is sub section 3.2.2&lt;/code>&lt;/pre>
&lt;p>Press &lt;strong>Shift + Enter&lt;/strong> or run this cell to see the effects. The following output should display on your notebook:&lt;/p>
&lt;p>&lt;img src="Contentpic3.jpeg" />&lt;/p>
&lt;p>Here, you will notice that you can easily navigate to the desired section from the table of content.&lt;/p>
&lt;p>Note that, We can also add a table of content in a Jupyter notebook using the pre-build extensions &lt;a href="https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html">click here&lt;/a>.&lt;/p>
&lt;/div>
&lt;/div></description></item><item><title>Python Tutorials</title><link>https://rajeshmajumderblog.netlify.app/blog/external-project/</link><pubDate>Fri, 14 Apr 2023 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/external-project/</guid><description/></item><item><title>C3</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_x/</link><pubDate>Sun, 19 Jun 2022 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_x/</guid><description>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_x/index_files/header-attrs/header-attrs.js">&lt;/script>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_x/index_files/htmlwidgets/htmlwidgets.js">&lt;/script>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_x/index_files/d3/d3.min.js">&lt;/script>
&lt;link href="https://rajeshmajumderblog.netlify.app/blog/internal-project_x/index_files/c3/c3.min.css" rel="stylesheet" />
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_x/index_files/c3/c3.min.js">&lt;/script>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_x/index_files/c3-binding/c3.js">&lt;/script>
&lt;div id="TOC">
&lt;ul>
&lt;li>&lt;a href="#instalation">Instalation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#usage">Usage&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#the-basics">The Basics&lt;/a>&lt;/li>
&lt;li>&lt;a href="#piping">Piping&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#other-line-plots">Other Line Plots&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#spline">Spline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#step">Step&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#bar-plots">Bar Plots&lt;/a>&lt;/li>
&lt;li>&lt;a href="#mixed-geometry-plots">Mixed Geometry Plots&lt;/a>&lt;/li>
&lt;li>&lt;a href="#secondary-y-axis">Secondary Y Axis&lt;/a>&lt;/li>
&lt;li>&lt;a href="#scatter-plot">Scatter Plot&lt;/a>&lt;/li>
&lt;li>&lt;a href="#pie-charts">Pie Charts&lt;/a>&lt;/li>
&lt;li>&lt;a href="#donut-charts">Donut Charts&lt;/a>&lt;/li>
&lt;li>&lt;a href="#gauge-charts">Gauge Charts&lt;/a>&lt;/li>
&lt;li>&lt;a href="#grid-lines-annotation">Grid Lines &amp;amp; Annotation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#sub-chart">Sub-chart&lt;/a>&lt;/li>
&lt;li>&lt;a href="#color-palette">Color Palette&lt;/a>&lt;/li>
&lt;li>&lt;a href="#point-size">Point Size&lt;/a>&lt;/li>
&lt;li>&lt;a href="#on-click">On Click&lt;/a>&lt;/li>
&lt;li>&lt;a href="#tooltips">Tooltips&lt;/a>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;p>The &lt;code>c3&lt;/code> package is a wrapper, or htmlwidget, for the C3 javascript charting library by Masayuki Tanaka. You will find this package useful if you are wanting to create a chart using R and embedding it in a Rmarkdown document or Shiny App.&lt;/p>
&lt;p>The &lt;code>C3&lt;/code> library is very versatile and includes a lot of options. Currently this package wraps most of the &lt;code>C3&lt;/code> options object. Even with this current limitation a wide range of options are available.&lt;/p>
&lt;div id="instalation" class="section level2">
&lt;h2>Instalation&lt;/h2>
&lt;pre class="r">&lt;code>install.packages(&amp;quot;c3&amp;quot;)
# or
devtools::install_github(&amp;quot;mrjoh3/c3&amp;quot;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="usage" class="section level2">
&lt;h2>Usage&lt;/h2>
&lt;p>The &lt;code>c3&lt;/code> package is intended to be as simple and lightweight as possible. As a starting point the data input must be a &lt;code>data.frame&lt;/code> or &lt;code>tibble&lt;/code> with several options.&lt;/p>
&lt;ul>
&lt;li>&lt;p>If a &lt;code>data.frame&lt;/code> without any options is passed all of the numeric columns will be plotted. This can be used in line and bar plots. Each column is a line or bar.&lt;/p>&lt;/li>
&lt;li>&lt;p>For more complex plots only 3 columns are used, those defined as &lt;code>x&lt;/code>, &lt;code>y&lt;/code> and &lt;code>group&lt;/code>. This requires a &lt;code>data.frame&lt;/code> with a vertical structure.&lt;/p>&lt;/li>
&lt;/ul>
&lt;div id="the-basics" class="section level3">
&lt;h3>The Basics&lt;/h3>
&lt;p>Where no options are supplied a simple line plot is produced by default. Where no x-axis is defined the plots are sequential. &lt;code>Date&lt;/code> x-axis can be parsed with not additional setting if in the format &lt;code>%Y-%m-%d&lt;/code> (ie ‘2014-01-01’)&lt;/p>
&lt;pre class="r">&lt;code>library(c3)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Warning: package &amp;#39;c3&amp;#39; was built under R version 4.1.3&lt;/code>&lt;/pre>
&lt;pre>&lt;code>##
## Attaching package: &amp;#39;c3&amp;#39;&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## The following objects are masked from &amp;#39;package:graphics&amp;#39;:
##
## grid, legend&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>data = data.frame(a = abs(rnorm(20) * 10),
b = abs(rnorm(20) * 10),
date = seq(as.Date(&amp;quot;2011-01-01&amp;quot;), by = &amp;quot;month&amp;quot;, length.out = 20))
c3(data)&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-1" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-1">{"x":{"data":{"json":[{"a":18.8027,"b":2.1291},{"a":14.5937,"b":5.4323},{"a":0.9583,"b":0.2821},{"a":0.6722,"b":21.4296},{"a":11.7692,"b":6.8126},{"a":1.5531,"b":3.3644},{"a":2.4642,"b":9.9238},{"a":1.0663,"b":0.0159},{"a":0.4771,"b":6.8109},{"a":3.3604,"b":13.8445},{"a":6.2161,"b":2.2487},{"a":11.9811,"b":7.565},{"a":8.058,"b":7.3558},{"a":2.0488,"b":9.9505},{"a":10.6849,"b":7.8359},{"a":2.0984,"b":6.3121},{"a":3.058,"b":13.219},{"a":6.4198,"b":4.7071},{"a":9.9318,"b":12.6226},{"a":4.6906,"b":4.822}],"keys":{"value":["a","b"]}},"opts":{"x":null,"y":null,"types":{"a":"numeric","b":"numeric","date":"Date"}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="piping" class="section level3">
&lt;h3>Piping&lt;/h3>
&lt;p>The package also imports the magrittr piping function &lt;code>(%&amp;gt;%)&lt;/code> to simplify syntax.&lt;/p>
&lt;pre class="r">&lt;code>data%&amp;gt;%c3()&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-2" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-2">{"x":{"data":{"json":[{"a":18.8027,"b":2.1291},{"a":14.5937,"b":5.4323},{"a":0.9583,"b":0.2821},{"a":0.6722,"b":21.4296},{"a":11.7692,"b":6.8126},{"a":1.5531,"b":3.3644},{"a":2.4642,"b":9.9238},{"a":1.0663,"b":0.0159},{"a":0.4771,"b":6.8109},{"a":3.3604,"b":13.8445},{"a":6.2161,"b":2.2487},{"a":11.9811,"b":7.565},{"a":8.058,"b":7.3558},{"a":2.0488,"b":9.9505},{"a":10.6849,"b":7.8359},{"a":2.0984,"b":6.3121},{"a":3.058,"b":13.219},{"a":6.4198,"b":4.7071},{"a":9.9318,"b":12.6226},{"a":4.6906,"b":4.822}],"keys":{"value":["a","b"]}},"opts":{"x":null,"y":null,"types":{"a":"numeric","b":"numeric","date":"Date"}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;/div>
&lt;div id="other-line-plots" class="section level2">
&lt;h2>Other Line Plots&lt;/h2>
&lt;p>There are 5 different line plots available:&lt;/p>
&lt;ul>
&lt;li>line&lt;/li>
&lt;li>spline&lt;/li>
&lt;li>step&lt;/li>
&lt;li>area&lt;/li>
&lt;li>area-step&lt;/li>
&lt;/ul>
&lt;div id="spline" class="section level3">
&lt;h3>Spline&lt;/h3>
&lt;pre class="r">&lt;code>data %&amp;gt;%
c3() %&amp;gt;%
c3_line(&amp;#39;spline&amp;#39;)&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-3" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-3">{"x":{"data":{"json":[{"a":18.8027,"b":2.1291},{"a":14.5937,"b":5.4323},{"a":0.9583,"b":0.2821},{"a":0.6722,"b":21.4296},{"a":11.7692,"b":6.8126},{"a":1.5531,"b":3.3644},{"a":2.4642,"b":9.9238},{"a":1.0663,"b":0.0159},{"a":0.4771,"b":6.8109},{"a":3.3604,"b":13.8445},{"a":6.2161,"b":2.2487},{"a":11.9811,"b":7.565},{"a":8.058,"b":7.3558},{"a":2.0488,"b":9.9505},{"a":10.6849,"b":7.8359},{"a":2.0984,"b":6.3121},{"a":3.058,"b":13.219},{"a":6.4198,"b":4.7071},{"a":9.9318,"b":12.6226},{"a":4.6906,"b":4.822}],"keys":{"value":["a","b"]},"type":"spline"},"opts":{"x":null,"y":null,"types":{"a":"numeric","b":"numeric","date":"Date"}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="step" class="section level3">
&lt;h3>Step&lt;/h3>
&lt;pre class="r">&lt;code>data %&amp;gt;%
c3(x = &amp;#39;date&amp;#39;) %&amp;gt;%
c3_line(&amp;#39;area-step&amp;#39;)&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-4" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-4">{"x":{"data":{"x":"date","json":[{"date":"2011-01-01","a":18.8027,"b":2.1291},{"date":"2011-02-01","a":14.5937,"b":5.4323},{"date":"2011-03-01","a":0.9583,"b":0.2821},{"date":"2011-04-01","a":0.6722,"b":21.4296},{"date":"2011-05-01","a":11.7692,"b":6.8126},{"date":"2011-06-01","a":1.5531,"b":3.3644},{"date":"2011-07-01","a":2.4642,"b":9.9238},{"date":"2011-08-01","a":1.0663,"b":0.0159},{"date":"2011-09-01","a":0.4771,"b":6.8109},{"date":"2011-10-01","a":3.3604,"b":13.8445},{"date":"2011-11-01","a":6.2161,"b":2.2487},{"date":"2011-12-01","a":11.9811,"b":7.565},{"date":"2012-01-01","a":8.058,"b":7.3558},{"date":"2012-02-01","a":2.0488,"b":9.9505},{"date":"2012-03-01","a":10.6849,"b":7.8359},{"date":"2012-04-01","a":2.0984,"b":6.3121},{"date":"2012-05-01","a":3.058,"b":13.219},{"date":"2012-06-01","a":6.4198,"b":4.7071},{"date":"2012-07-01","a":9.9318,"b":12.6226},{"date":"2012-08-01","a":4.6906,"b":4.822}],"keys":{"value":["date","a","b"]},"type":"area-step"},"opts":{"x":"date","y":null,"types":{"a":"numeric","b":"numeric","date":"Date"}},"axis":{"x":{"label":"date","type":"timeseries"}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;/div>
&lt;div id="bar-plots" class="section level2">
&lt;h2>Bar Plots&lt;/h2>
&lt;pre class="r">&lt;code>data[1:10, ] %&amp;gt;%
c3() %&amp;gt;%
c3_bar(stacked = TRUE,
rotate = TRUE)&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-5" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-5">{"x":{"data":{"json":[{"a":18.8027,"b":2.1291},{"a":14.5937,"b":5.4323},{"a":0.9583,"b":0.2821},{"a":0.6722,"b":21.4296},{"a":11.7692,"b":6.8126},{"a":1.5531,"b":3.3644},{"a":2.4642,"b":9.9238},{"a":1.0663,"b":0.0159},{"a":0.4771,"b":6.8109},{"a":3.3604,"b":13.8445}],"keys":{"value":["a","b"]},"type":"bar","groups":{"value":["a","b"]}},"opts":{"x":null,"y":null,"types":{"a":"numeric","b":"numeric","date":"Date"}},"axis":{"x":{"type":"category"},"rotated":true},"bar":{"zerobased":true,"width":{"ratio":0.6}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="mixed-geometry-plots" class="section level2">
&lt;h2>Mixed Geometry Plots&lt;/h2>
&lt;p>Mixed geometry currently only works with a horizontal &lt;code>data.frame&lt;/code> where each numeric column is plotted.&lt;/p>
&lt;pre class="r">&lt;code>data$c &amp;lt;- abs(rnorm(20) *10)
data$d &amp;lt;- abs(rnorm(20) *10)
data %&amp;gt;%
c3() %&amp;gt;%
c3_mixedGeom(type = &amp;#39;bar&amp;#39;,
stacked = c(&amp;#39;b&amp;#39;,&amp;#39;d&amp;#39;),
types = list(a=&amp;#39;area&amp;#39;,
c=&amp;#39;spline&amp;#39;)
)&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-6" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-6">{"x":{"data":{"json":[{"a":18.8027,"b":2.1291,"c":10.4047,"d":9.3338},{"a":14.5937,"b":5.4323,"c":6.735,"d":0.8366},{"a":0.9583,"b":0.2821,"c":7.8497,"d":17.9726},{"a":0.6722,"b":21.4296,"c":6.3388,"d":7.1507},{"a":11.7692,"b":6.8126,"c":4.0074,"d":12.2171},{"a":1.5531,"b":3.3644,"c":3.2599,"d":3.5954},{"a":2.4642,"b":9.9238,"c":1.393,"d":8.1317},{"a":1.0663,"b":0.0159,"c":2.7268,"d":8.5254},{"a":0.4771,"b":6.8109,"c":8.0201,"d":16.6374},{"a":3.3604,"b":13.8445,"c":5.0411,"d":2.9788},{"a":6.2161,"b":2.2487,"c":8.981,"d":3.675},{"a":11.9811,"b":7.565,"c":5.2511,"d":6.3659},{"a":8.058,"b":7.3558,"c":1.5734,"d":11.8384},{"a":2.0488,"b":9.9505,"c":14.3538,"d":14.0533},{"a":10.6849,"b":7.8359,"c":1.1975,"d":12.9043},{"a":2.0984,"b":6.3121,"c":6.0863,"d":11.7934},{"a":3.058,"b":13.219,"c":11.8282,"d":9.567},{"a":6.4198,"b":4.7071,"c":7.055,"d":5.0574},{"a":9.9318,"b":12.6226,"c":0.0841,"d":0.6431},{"a":4.6906,"b":4.822,"c":4.2438,"d":6.5964}],"keys":{"value":["a","b","c","d"]},"type":"bar","types":{"a":"area","c":"spline"},"groups":["b","d"]},"opts":{"x":null,"y":null,"types":{"a":"numeric","b":"numeric","date":"Date","c":"numeric","d":"numeric"}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="secondary-y-axis" class="section level2">
&lt;h2>Secondary Y Axis&lt;/h2>
&lt;p>To use a secondary Y axis columns must first be matched to an axis and then the secondary axis made visible.&lt;/p>
&lt;pre class="r">&lt;code>data %&amp;gt;%
dplyr::select(date, a, b) %&amp;gt;%
c3(x = &amp;#39;date&amp;#39;,
axes = list(a = &amp;#39;y&amp;#39;,
b = &amp;#39;y2&amp;#39;)) %&amp;gt;%
c3_mixedGeom(types = list(a = &amp;#39;line&amp;#39;,
b = &amp;#39;area&amp;#39;)) %&amp;gt;%
y2Axis()&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-7" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-7">{"x":{"data":{"axes":{"a":"y","b":"y2"},"x":"date","json":[{"date":"2011-01-01","a":18.8027,"b":2.1291},{"date":"2011-02-01","a":14.5937,"b":5.4323},{"date":"2011-03-01","a":0.9583,"b":0.2821},{"date":"2011-04-01","a":0.6722,"b":21.4296},{"date":"2011-05-01","a":11.7692,"b":6.8126},{"date":"2011-06-01","a":1.5531,"b":3.3644},{"date":"2011-07-01","a":2.4642,"b":9.9238},{"date":"2011-08-01","a":1.0663,"b":0.0159},{"date":"2011-09-01","a":0.4771,"b":6.8109},{"date":"2011-10-01","a":3.3604,"b":13.8445},{"date":"2011-11-01","a":6.2161,"b":2.2487},{"date":"2011-12-01","a":11.9811,"b":7.565},{"date":"2012-01-01","a":8.058,"b":7.3558},{"date":"2012-02-01","a":2.0488,"b":9.9505},{"date":"2012-03-01","a":10.6849,"b":7.8359},{"date":"2012-04-01","a":2.0984,"b":6.3121},{"date":"2012-05-01","a":3.058,"b":13.219},{"date":"2012-06-01","a":6.4198,"b":4.7071},{"date":"2012-07-01","a":9.9318,"b":12.6226},{"date":"2012-08-01","a":4.6906,"b":4.822}],"keys":{"value":["date","a","b"]},"type":"line","types":{"a":"line","b":"area"}},"opts":{"x":"date","y":null,"types":{"date":"Date","a":"numeric","b":"numeric"}},"axis":{"x":{"label":"date","type":"timeseries"},"y2":{"show":true}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="scatter-plot" class="section level2">
&lt;h2>Scatter Plot&lt;/h2>
&lt;pre class="r">&lt;code>mtcars %&amp;gt;%
c3(x = &amp;#39;mpg&amp;#39;,
y = &amp;#39;wt&amp;#39;,
group = &amp;#39;cyl&amp;#39;) %&amp;gt;%
c3_scatter()&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-8" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-8">{"x":{"data":{"json":[{"4":2.32,"6":2.62,"8":3.44,"4_x":22.8,"6_x":21,"8_x":18.7},{"4":3.19,"6":2.875,"8":3.57,"4_x":24.4,"6_x":21,"8_x":14.3},{"4":3.15,"6":3.215,"8":4.07,"4_x":22.8,"6_x":21.4,"8_x":16.4},{"4":2.2,"6":3.46,"8":3.73,"4_x":32.4,"6_x":18.1,"8_x":17.3},{"4":1.615,"6":3.44,"8":3.78,"4_x":30.4,"6_x":19.2,"8_x":15.2},{"4":1.835,"6":3.44,"8":5.25,"4_x":33.9,"6_x":17.8,"8_x":10.4},{"4":2.465,"6":2.77,"8":5.424,"4_x":21.5,"6_x":19.7,"8_x":10.4},{"4":1.935,"8":5.345,"4_x":27.3,"8_x":14.7},{"4":2.14,"8":3.52,"4_x":26,"8_x":15.5},{"4":1.513,"8":3.435,"4_x":30.4,"8_x":15.2},{"4":2.78,"8":3.84,"4_x":21.4,"8_x":13.3},{"8":3.845,"8_x":19.2},{"8":3.17,"8_x":15.8},{"8":3.57,"8_x":15}],"keys":{"value":["4","6","8","4_x","6_x","8_x"]},"xs":{"6":"6_x","4":"4_x","8":"8_x"},"type":"scatter"},"opts":{"x":"mpg","y":"wt","types":{"mpg":"numeric","cyl":"numeric","disp":"numeric","hp":"numeric","drat":"numeric","wt":"numeric","qsec":"numeric","vs":"numeric","am":"numeric","gear":"numeric","carb":"numeric"}},"axis":{"x":{"label":"mpg"},"y":{"label":"wt"}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="pie-charts" class="section level2">
&lt;h2>Pie Charts&lt;/h2>
&lt;pre class="r">&lt;code>data.frame(India = 45,
Bangladesh = 20,
SriLanka = 10) %&amp;gt;%
c3() %&amp;gt;%
c3_pie()&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-9" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-9">{"x":{"data":{"json":[{"India":45,"Bangladesh":20,"SriLanka":10}],"keys":{"value":["India","Bangladesh","SriLanka"]},"type":"pie"},"opts":{"x":null,"y":null,"types":{"India":"numeric","Bangladesh":"numeric","SriLanka":"numeric"}},"pie":{"expand":true,"label":{"show":true,"threshold":null,"format":null}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="donut-charts" class="section level2">
&lt;h2>Donut Charts&lt;/h2>
&lt;pre class="r">&lt;code>data.frame(red = 82, green = 33, blue = 93) %&amp;gt;%
c3(colors = list(red = &amp;#39;red&amp;#39;,
green = &amp;#39;green&amp;#39;,
blue = &amp;#39;blue&amp;#39;)) %&amp;gt;%
c3_donut(title = &amp;#39;#d053ee&amp;#39;)&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-10" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-10">{"x":{"data":{"colors":{"red":"red","green":"green","blue":"blue"},"json":[{"red":82,"green":33,"blue":93}],"keys":{"value":["red","green","blue"]},"type":"donut"},"opts":{"x":null,"y":null,"types":{"red":"numeric","green":"numeric","blue":"numeric"}},"donut":{"expand":true,"title":"#d053ee","label":{"show":true,"threshold":null,"format":null}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="gauge-charts" class="section level2">
&lt;h2>Gauge Charts&lt;/h2>
&lt;pre class="r">&lt;code>data.frame(data = 80) %&amp;gt;%
c3() %&amp;gt;%
c3_gauge()&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-11" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-11">{"x":{"data":{"json":[{"data":80}],"keys":{"value":["data"]},"type":"gauge"},"opts":{"x":null,"y":null,"types":{"data":"numeric"}},"gauge":{"label":null,"min":0,"max":100,"units":null,"width":null},"color":{"pattern":["#FF0000","#F97600","#F6C600","#60B044"],"threshold":{"unit":"value","max":100,"values":[30,60,90,100]}},"size":{"height":null}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="grid-lines-annotation" class="section level2">
&lt;h2>Grid Lines &amp;amp; Annotation&lt;/h2>
&lt;pre class="r">&lt;code>data %&amp;gt;%
c3() %&amp;gt;%
grid(&amp;#39;y&amp;#39;) %&amp;gt;%
grid(&amp;#39;x&amp;#39;,
show = F,
lines = data.frame(value = c(3, 10),
text= c(&amp;#39;Line 1&amp;#39;,&amp;#39;Line 2&amp;#39;)))&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-12" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-12">{"x":{"data":{"json":[{"a":18.8027,"b":2.1291,"c":10.4047,"d":9.3338},{"a":14.5937,"b":5.4323,"c":6.735,"d":0.8366},{"a":0.9583,"b":0.2821,"c":7.8497,"d":17.9726},{"a":0.6722,"b":21.4296,"c":6.3388,"d":7.1507},{"a":11.7692,"b":6.8126,"c":4.0074,"d":12.2171},{"a":1.5531,"b":3.3644,"c":3.2599,"d":3.5954},{"a":2.4642,"b":9.9238,"c":1.393,"d":8.1317},{"a":1.0663,"b":0.0159,"c":2.7268,"d":8.5254},{"a":0.4771,"b":6.8109,"c":8.0201,"d":16.6374},{"a":3.3604,"b":13.8445,"c":5.0411,"d":2.9788},{"a":6.2161,"b":2.2487,"c":8.981,"d":3.675},{"a":11.9811,"b":7.565,"c":5.2511,"d":6.3659},{"a":8.058,"b":7.3558,"c":1.5734,"d":11.8384},{"a":2.0488,"b":9.9505,"c":14.3538,"d":14.0533},{"a":10.6849,"b":7.8359,"c":1.1975,"d":12.9043},{"a":2.0984,"b":6.3121,"c":6.0863,"d":11.7934},{"a":3.058,"b":13.219,"c":11.8282,"d":9.567},{"a":6.4198,"b":4.7071,"c":7.055,"d":5.0574},{"a":9.9318,"b":12.6226,"c":0.0841,"d":0.6431},{"a":4.6906,"b":4.822,"c":4.2438,"d":6.5964}],"keys":{"value":["a","b","c","d"]}},"opts":{"x":null,"y":null,"types":{"a":"numeric","b":"numeric","date":"Date","c":"numeric","d":"numeric"}},"grid":{"y":{"show":true},"x":{"show":false,"lines":{"value":[3,10],"text":["Line 1","Line 2"]}}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="sub-chart" class="section level2">
&lt;h2>Sub-chart&lt;/h2>
&lt;pre class="r">&lt;code>data %&amp;gt;%
c3(x = &amp;#39;date&amp;#39;) %&amp;gt;%
subchart()&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-13" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-13">{"x":{"data":{"x":"date","json":[{"date":"2011-01-01","a":18.8027,"b":2.1291,"c":10.4047,"d":9.3338},{"date":"2011-02-01","a":14.5937,"b":5.4323,"c":6.735,"d":0.8366},{"date":"2011-03-01","a":0.9583,"b":0.2821,"c":7.8497,"d":17.9726},{"date":"2011-04-01","a":0.6722,"b":21.4296,"c":6.3388,"d":7.1507},{"date":"2011-05-01","a":11.7692,"b":6.8126,"c":4.0074,"d":12.2171},{"date":"2011-06-01","a":1.5531,"b":3.3644,"c":3.2599,"d":3.5954},{"date":"2011-07-01","a":2.4642,"b":9.9238,"c":1.393,"d":8.1317},{"date":"2011-08-01","a":1.0663,"b":0.0159,"c":2.7268,"d":8.5254},{"date":"2011-09-01","a":0.4771,"b":6.8109,"c":8.0201,"d":16.6374},{"date":"2011-10-01","a":3.3604,"b":13.8445,"c":5.0411,"d":2.9788},{"date":"2011-11-01","a":6.2161,"b":2.2487,"c":8.981,"d":3.675},{"date":"2011-12-01","a":11.9811,"b":7.565,"c":5.2511,"d":6.3659},{"date":"2012-01-01","a":8.058,"b":7.3558,"c":1.5734,"d":11.8384},{"date":"2012-02-01","a":2.0488,"b":9.9505,"c":14.3538,"d":14.0533},{"date":"2012-03-01","a":10.6849,"b":7.8359,"c":1.1975,"d":12.9043},{"date":"2012-04-01","a":2.0984,"b":6.3121,"c":6.0863,"d":11.7934},{"date":"2012-05-01","a":3.058,"b":13.219,"c":11.8282,"d":9.567},{"date":"2012-06-01","a":6.4198,"b":4.7071,"c":7.055,"d":5.0574},{"date":"2012-07-01","a":9.9318,"b":12.6226,"c":0.0841,"d":0.6431},{"date":"2012-08-01","a":4.6906,"b":4.822,"c":4.2438,"d":6.5964}],"keys":{"value":["date","a","b","c","d"]}},"opts":{"x":"date","y":null,"types":{"a":"numeric","b":"numeric","date":"Date","c":"numeric","d":"numeric"}},"axis":{"x":{"label":"date","type":"timeseries"}},"subchart":{"show":true,"size":{"height":20}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="color-palette" class="section level2">
&lt;h2>Color Palette&lt;/h2>
&lt;p>Plot color palettes can be changed to either &lt;code>RColorBrewer&lt;/code> or &lt;code>viridis&lt;/code> palettes using either &lt;code>RColorBrewer&lt;/code> (S3 method) or &lt;code>c3_viridus&lt;/code>.&lt;/p>
&lt;pre class="r">&lt;code>data.frame(sugar = 20,
fat = 45,
salt = 10,
vegetables = 60) %&amp;gt;%
c3() %&amp;gt;%
c3_pie() %&amp;gt;%
RColorBrewer()&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-14" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-14">{"x":{"data":{"json":[{"sugar":20,"fat":45,"salt":10,"vegetables":60}],"keys":{"value":["sugar","fat","salt","vegetables"]},"type":"pie"},"opts":{"x":null,"y":null,"types":{"sugar":"numeric","fat":"numeric","salt":"numeric","vegetables":"numeric"}},"pie":{"expand":true,"label":{"show":true,"threshold":null,"format":null}},"color":{"pattern":["#D7191C","#FDAE61","#ABDDA4","#2B83BA"]}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;pre class="r">&lt;code>data.frame(sugar = 20,
fat = 45,
salt = 10,
vegetables = 60) %&amp;gt;%
c3() %&amp;gt;%
c3_pie() %&amp;gt;%
c3_viridis()&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-15" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-15">{"x":{"data":{"json":[{"sugar":20,"fat":45,"salt":10,"vegetables":60}],"keys":{"value":["sugar","fat","salt","vegetables"]},"type":"pie"},"opts":{"x":null,"y":null,"types":{"sugar":"numeric","fat":"numeric","salt":"numeric","vegetables":"numeric"}},"pie":{"expand":true,"label":{"show":true,"threshold":null,"format":null}},"color":{"pattern":["#440154","#31688E","#35B779","#FDE725"]}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="point-size" class="section level2">
&lt;h2>Point Size&lt;/h2>
&lt;pre class="r">&lt;code>data %&amp;gt;%
c3(x = &amp;#39;date&amp;#39;) %&amp;gt;%
point_options(r = 6,
expand.r = 2)&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-16" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-16">{"x":{"data":{"x":"date","json":[{"date":"2011-01-01","a":18.8027,"b":2.1291,"c":10.4047,"d":9.3338},{"date":"2011-02-01","a":14.5937,"b":5.4323,"c":6.735,"d":0.8366},{"date":"2011-03-01","a":0.9583,"b":0.2821,"c":7.8497,"d":17.9726},{"date":"2011-04-01","a":0.6722,"b":21.4296,"c":6.3388,"d":7.1507},{"date":"2011-05-01","a":11.7692,"b":6.8126,"c":4.0074,"d":12.2171},{"date":"2011-06-01","a":1.5531,"b":3.3644,"c":3.2599,"d":3.5954},{"date":"2011-07-01","a":2.4642,"b":9.9238,"c":1.393,"d":8.1317},{"date":"2011-08-01","a":1.0663,"b":0.0159,"c":2.7268,"d":8.5254},{"date":"2011-09-01","a":0.4771,"b":6.8109,"c":8.0201,"d":16.6374},{"date":"2011-10-01","a":3.3604,"b":13.8445,"c":5.0411,"d":2.9788},{"date":"2011-11-01","a":6.2161,"b":2.2487,"c":8.981,"d":3.675},{"date":"2011-12-01","a":11.9811,"b":7.565,"c":5.2511,"d":6.3659},{"date":"2012-01-01","a":8.058,"b":7.3558,"c":1.5734,"d":11.8384},{"date":"2012-02-01","a":2.0488,"b":9.9505,"c":14.3538,"d":14.0533},{"date":"2012-03-01","a":10.6849,"b":7.8359,"c":1.1975,"d":12.9043},{"date":"2012-04-01","a":2.0984,"b":6.3121,"c":6.0863,"d":11.7934},{"date":"2012-05-01","a":3.058,"b":13.219,"c":11.8282,"d":9.567},{"date":"2012-06-01","a":6.4198,"b":4.7071,"c":7.055,"d":5.0574},{"date":"2012-07-01","a":9.9318,"b":12.6226,"c":0.0841,"d":0.6431},{"date":"2012-08-01","a":4.6906,"b":4.822,"c":4.2438,"d":6.5964}],"keys":{"value":["date","a","b","c","d"]}},"opts":{"x":"date","y":null,"types":{"a":"numeric","b":"numeric","date":"Date","c":"numeric","d":"numeric"}},"axis":{"x":{"label":"date","type":"timeseries"}},"point":{"show":true,"r":6,"focus":{"expand":{"enabled":true,"r":12}},"select":{"r":24}}},"evals":[],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="on-click" class="section level2">
&lt;h2>On Click&lt;/h2>
&lt;p>Onclick, onmouseover and onmouseout are all available via the &lt;code>c3&lt;/code> function. To use wrap a js function as a character string to &lt;code>htmlwidgets::JS()&lt;/code>. Please see the &lt;code>C3.js&lt;/code> documentation and examples. The example below should be enough to get you started.&lt;/p>
&lt;pre class="r">&lt;code>data %&amp;gt;%
c3(onclick = htmlwidgets::JS(&amp;#39;function(d, element){console.log(d)}&amp;#39;))&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-17" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-17">{"x":{"data":{"onclick":"function(d, element){console.log(d)}","json":[{"a":18.8027,"b":2.1291,"c":10.4047,"d":9.3338},{"a":14.5937,"b":5.4323,"c":6.735,"d":0.8366},{"a":0.9583,"b":0.2821,"c":7.8497,"d":17.9726},{"a":0.6722,"b":21.4296,"c":6.3388,"d":7.1507},{"a":11.7692,"b":6.8126,"c":4.0074,"d":12.2171},{"a":1.5531,"b":3.3644,"c":3.2599,"d":3.5954},{"a":2.4642,"b":9.9238,"c":1.393,"d":8.1317},{"a":1.0663,"b":0.0159,"c":2.7268,"d":8.5254},{"a":0.4771,"b":6.8109,"c":8.0201,"d":16.6374},{"a":3.3604,"b":13.8445,"c":5.0411,"d":2.9788},{"a":6.2161,"b":2.2487,"c":8.981,"d":3.675},{"a":11.9811,"b":7.565,"c":5.2511,"d":6.3659},{"a":8.058,"b":7.3558,"c":1.5734,"d":11.8384},{"a":2.0488,"b":9.9505,"c":14.3538,"d":14.0533},{"a":10.6849,"b":7.8359,"c":1.1975,"d":12.9043},{"a":2.0984,"b":6.3121,"c":6.0863,"d":11.7934},{"a":3.058,"b":13.219,"c":11.8282,"d":9.567},{"a":6.4198,"b":4.7071,"c":7.055,"d":5.0574},{"a":9.9318,"b":12.6226,"c":0.0841,"d":0.6431},{"a":4.6906,"b":4.822,"c":4.2438,"d":6.5964}],"keys":{"value":["a","b","c","d"]}},"opts":{"x":null,"y":null,"types":{"a":"numeric","b":"numeric","date":"Date","c":"numeric","d":"numeric"}}},"evals":["data.onclick"],"jsHooks":[]}&lt;/script>
&lt;/div>
&lt;div id="tooltips" class="section level2">
&lt;h2>Tooltips&lt;/h2>
&lt;p>C3 tooltips are readily modified with the use of javascript functions. For further detail see the &lt;code>C3.js&lt;/code> documentation. Or for more advanced usage see the &lt;code>C3.js&lt;/code> examples page.&lt;/p>
&lt;pre class="r">&lt;code>library(&amp;quot;htmlwidgets&amp;quot;)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Warning: package &amp;#39;htmlwidgets&amp;#39; was built under R version 4.1.3&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>data %&amp;gt;%
c3() %&amp;gt;%
tooltip(format = list(title = JS(&amp;quot;function (x) { return &amp;#39;Data &amp;#39; + x; }&amp;quot;),
name = JS(&amp;#39;function (name, ratio, id, index) { return name; }&amp;#39;),
value = JS(&amp;#39;function (value, ratio, id, index) { return ratio; }&amp;#39;)))&lt;/code>&lt;/pre>
&lt;div id="htmlwidget-18" style="width:672px;height:480px;" class="c3 html-widget">&lt;/div>
&lt;script type="application/json" data-for="htmlwidget-18">{"x":{"data":{"json":[{"a":18.8027,"b":2.1291,"c":10.4047,"d":9.3338},{"a":14.5937,"b":5.4323,"c":6.735,"d":0.8366},{"a":0.9583,"b":0.2821,"c":7.8497,"d":17.9726},{"a":0.6722,"b":21.4296,"c":6.3388,"d":7.1507},{"a":11.7692,"b":6.8126,"c":4.0074,"d":12.2171},{"a":1.5531,"b":3.3644,"c":3.2599,"d":3.5954},{"a":2.4642,"b":9.9238,"c":1.393,"d":8.1317},{"a":1.0663,"b":0.0159,"c":2.7268,"d":8.5254},{"a":0.4771,"b":6.8109,"c":8.0201,"d":16.6374},{"a":3.3604,"b":13.8445,"c":5.0411,"d":2.9788},{"a":6.2161,"b":2.2487,"c":8.981,"d":3.675},{"a":11.9811,"b":7.565,"c":5.2511,"d":6.3659},{"a":8.058,"b":7.3558,"c":1.5734,"d":11.8384},{"a":2.0488,"b":9.9505,"c":14.3538,"d":14.0533},{"a":10.6849,"b":7.8359,"c":1.1975,"d":12.9043},{"a":2.0984,"b":6.3121,"c":6.0863,"d":11.7934},{"a":3.058,"b":13.219,"c":11.8282,"d":9.567},{"a":6.4198,"b":4.7071,"c":7.055,"d":5.0574},{"a":9.9318,"b":12.6226,"c":0.0841,"d":0.6431},{"a":4.6906,"b":4.822,"c":4.2438,"d":6.5964}],"keys":{"value":["a","b","c","d"]}},"opts":{"x":null,"y":null,"types":{"a":"numeric","b":"numeric","date":"Date","c":"numeric","d":"numeric"}},"tooltip":{"show":true,"grouped":true,"format":{"title":"function (x) { return 'Data ' + x; }","name":"function (name, ratio, id, index) { return name; }","value":"function (value, ratio, id, index) { return ratio; }"}}},"evals":["tooltip.format.title","tooltip.format.name","tooltip.format.value"],"jsHooks":[]}&lt;/script>
&lt;/div></description></item><item><title>Basic introduction to SQL with MySQL</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_ix/</link><pubDate>Wed, 11 May 2022 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_ix/</guid><description>
&lt;p>This tutorial was planned for the under graduate students who are totally new to SQL. Through out this course we mainly used MySQL &amp;amp; MySQL query and some times R and Python were used. Most of the tutorials of this lecture series are given here; rest of the tutorials and course works will be available soon.&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: decimal">
&lt;li>&lt;a href="https://drive.google.com/file/d/1zT1o_37lN6wIJvmPmONzatcp4EhZjS-6/view?usp=sharig">Tutorial-I Introduction (pdf)&lt;/a>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: decimal">
&lt;li>&lt;a href="https://drive.google.com/file/d/1pHy3ls6IK5r4QN8OAM3zPRsSd_Bm6lSn/view?usp=sharing">Tutorial-II Where can we write SQL Code ? (pdf)&lt;/a>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;ol start="3" style="list-style-type: decimal">
&lt;li>&lt;a href="https://drive.google.com/file/d/1IarwG7R3c8JRNI0CkdktONjUfWLjUsRB/view?usp=sharing">Tutorial-III MYSQL Terminologies (pdf)&lt;/a>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;ol start="4" style="list-style-type: decimal">
&lt;li>&lt;a href="https://drive.google.com/file/d/1xGj0GzsFWiU_GK8CkVcRn5pKcYxjW_2R/view?usp=sharing">Tutorial-IV Querying Basics (pdf)&lt;/a>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;ol start="5" style="list-style-type: decimal">
&lt;li>&lt;a href="https://drive.google.com/file/d/1SYAb3Q7DYn3zgZZEHNlfC8zOmuvZ1x7K/view?usp=sharing">Tutorial-V Creating, Updating, Deleting (pdf)&lt;/a>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;ol start="6" style="list-style-type: decimal">
&lt;li>&lt;a href="https://drive.google.com/file/d/1n4Rf0Hhir963VRzF1qdnBlb9rgamnI8i/view?usp=sharing">Tutorial-VI Data types (pdf)&lt;/a>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;ol start="7" style="list-style-type: decimal">
&lt;li>&lt;a href="https://drive.google.com/file/d/1gGgx_37aO3wl1s32Lj0GZgyTZSLcrulM/view?usp=sharing">Tutorial-VII Operators and Functions (pdf)&lt;/a>&lt;/li>
&lt;/ol>&lt;/li>
&lt;/ul></description></item><item><title>Survival Analysis with R</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/</link><pubDate>Fri, 08 Apr 2022 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/</guid><description>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/header-attrs/header-attrs.js">&lt;/script>
&lt;div id="TOC">
&lt;ul>
&lt;li>&lt;a href="#introduction">Introduction&lt;/a>&lt;/li>
&lt;li>&lt;a href="#examples-of-survival-data">Examples of Survival Data&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#a-the-lung-dataset">(A) The lung dataset&lt;/a>&lt;/li>
&lt;li>&lt;a href="#b-the-alloauto-dataset">(B) The alloauto dataset&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#some-basic-definations-which-are-used-in-survival-studies">Some Basic Definations which are used in Survival Studies:&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#cumulative-distribution">Cumulative Distribution&lt;/a>&lt;/li>
&lt;li>&lt;a href="#survival-function">Survival Function&lt;/a>&lt;/li>
&lt;li>&lt;a href="#failure-rate-or-hazard-rate">Failure rate or Hazard rate&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#what-is-censoring">What is Censoring ?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#censored-survival-data">Censored survival data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#distribution-of-follow-up-time">Distribution of follow-up time&lt;/a>&lt;/li>
&lt;li>&lt;a href="#components-of-survival-data">Components of survival data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#dealing-with-dates-in-r">Dealing with dates in R&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#formating-dates">Formating dates&lt;/a>&lt;/li>
&lt;li>&lt;a href="#calculating-survival-times">Calculating Survival Times&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#event-indicator-and-creating-survival-objects">Event indicator and Creating survival objects&lt;/a>&lt;/li>
&lt;li>&lt;a href="#estimating-survival-curves-and-survival-probabilities-with-kaplan-meier-method">Estimating Survival curves and Survival probabilities with Kaplan-Meier method&lt;/a>&lt;/li>
&lt;li>&lt;a href="#kaplan-meier-plots">Kaplan-Meier Plots&lt;/a>&lt;/li>
&lt;li>&lt;a href="#estimating-x-years-survival">Estimating x-years survival&lt;/a>&lt;/li>
&lt;li>&lt;a href="#testing-of-survival-curves">Testing of survival curves&lt;/a>&lt;/li>
&lt;li>&lt;a href="#coxs-proportional-hazard-regression-model">Cox’s Proportional Hazard Regression model&lt;/a>&lt;/li>
&lt;li>&lt;a href="#competing-risks">Competing Risks&lt;/a>&lt;/li>
&lt;li>&lt;a href="#cumulative-incidence-in-melanoma-data">Cumulative incidence in Melanoma data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#plot-the-cumulative-incidence-cif">Plot the Cumulative incidence (CIF)&lt;/a>&lt;/li>
&lt;li>&lt;a href="#plot-the-cumulative-incidence-cif-manually">Plot the Cumulative incidence (CIF) manually&lt;/a>&lt;/li>
&lt;li>&lt;a href="#compare-cumultive-incidence-between-groups">Compare cumultive incidence between groups&lt;/a>&lt;/li>
&lt;li>&lt;a href="#plot-the-cumulative-incidence-cif-between-groups-manually">Plot the Cumulative incidence (CIF) between groups manually&lt;/a>&lt;/li>
&lt;li>&lt;a href="#competing-risks-regression">Competing risks regression&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#a-competing-risks-regression-in-melanoma-data--subdistribution-hazard-approach">(A) Competing risks regression in Melanoma data- subdistribution hazard approach&lt;/a>&lt;/li>
&lt;li>&lt;a href="#b-competing-risks-regression-in-melanoma-data--cause-specific-hazard-approach">(B) Competing risks regression in Melanoma data- Cause-specific hazard approach&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;ul>
&lt;li>&lt;strong>This class will provide theoretical as well as hands-on instruction and exercises covering basic survival analysis using R.&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Some References for further reading:&lt;/strong>
&lt;ul>
&lt;li>&lt;span style="text-decoration:underline">&lt;em>Clark, T., Bradhurn, M., Love, S., &amp;amp; Altman, D. (2003). Survival analysis part I: Basic concepts and first analysis.232-238.ISSN 0007-0920.&lt;/em>&lt;/span>&lt;/li>
&lt;li>&lt;span style="text-decoration:underline">&lt;em>Clark, T., Bradhurn, M., Love, S., &amp;amp; Altman, D. (2003). Survival analysis part II: Multivariate data analysis- an introduction to concepts and methods. British Journal of Cancer, 89(3),431-436.&lt;/em>&lt;/span>&lt;/li>
&lt;li>&lt;span style="text-decoration:underline">&lt;em>Clark, T., Bradhurn, M., Love, S., &amp;amp; Altman, D. (2003). Survival analysis part III: Multivariate data analysis- choosing a model and assessing its adequacy and fit. British Journal of Cancer, 89(4),605-11.&lt;/em>&lt;/span>&lt;/li>
&lt;li>&lt;span style="text-decoration:underline">&lt;em>Clark, T., Bradhurn, M., Love, S., &amp;amp; Altman, D. (2003). Survival analysis part IV: Farther concepts and methods in survival analysis. 781-786.ISSN 0007-0920.&lt;/em>&lt;/span>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;strong>It is assumed that the readers are familiar with R Programming. If not then get some basic R programming knowledge and then comeback.&lt;/strong>&lt;/li>
&lt;li>&lt;strong>In this tutorial I have used some random toy data set as well as some R inbuilt data sets.&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Some packages we’ll be using here include:&lt;/strong>
&lt;ul>
&lt;li>&lt;em>lubridate&lt;/em>&lt;/li>
&lt;li>&lt;em>survival&lt;/em>&lt;/li>
&lt;li>&lt;em>cmprsk&lt;/em> and some others.&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Lets start….&lt;/strong>&lt;/p>
&lt;div id="introduction" class="section level1">
&lt;h1>Introduction&lt;/h1>
&lt;p>Generally, Survival Analysis is a collection off statistical procedures for data analysis for which the outcome variable of interest is &lt;span style="text-decoration:underline">&lt;strong>time until an event occurs.&lt;/strong>&lt;/span>. More precisely, consider an example; suppose we want to study that how diabetes rate differed between males and females. So, here we will use some basic categorical data analysis – comparing proportions (risks, rates, etc) between different groups using a chi-square or fisher exact test, or logistic regression. Note that, in this kind of analysis you implicitly assume that the rates are constant over the period of the study, or as defined by the different groups you defined.&lt;/p>
&lt;p>But, in longitudinal studies where you track samples or subjects from one time point (e.g., entry into a study, diagnosis, start of a treatment) until you observe some outcome event (e.g., death, onset of disease, relapse), it doesn’t make sense to assume the rates are constant. For example: the risk of death after heart surgery is highest immediately post-op, decreases as the patient recovers, then rises slowly again as the patient ages. Or, recurrence rate of different cancers varies highly over time, and depends on tumor genetics, treatment, and other environmental factors.&lt;/p>
&lt;div id="examples-for-time-to-event-data" class="section level4">
&lt;h4>Examples for Time-to-Event data&lt;/h4>
&lt;p>&lt;strong>Examples from cancer&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Time from surgery to death&lt;/li>
&lt;li>Time from start of treatment to progression&lt;/li>
&lt;li>Time from response to recurrence&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Examples from other fields&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Time from HIV infection to development of AIDS&lt;/li>
&lt;li>Time to heart attack&lt;/li>
&lt;li>Time to onset of substance abuse&lt;/li>
&lt;li>Time to initiation of sexual activity&lt;/li>
&lt;li>Time to machine malfunction&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Aliases for survival analysis&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Reliability analysis&lt;/li>
&lt;li>Duration analysis&lt;/li>
&lt;li>Event history analysis&lt;/li>
&lt;li>Time-to-event analysis&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;div id="examples-of-survival-data" class="section level1">
&lt;h1>Examples of Survival Data&lt;/h1>
&lt;p>In the following you can see how the survival data looks like. For this, I’ve used some R’s inbuilt datasets which are available under different packages.&lt;/p>
&lt;div id="a-the-lung-dataset" class="section level2">
&lt;h2>(A) The lung dataset&lt;/h2>
&lt;p>The &lt;em>lung&lt;/em> dataset is available inside the &lt;em>survival&lt;/em> package in R. The data contain subjects with advanced lung cancer from the North Central Cancer Treatment Group.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>time:&lt;/strong> Survival time in days&lt;/li>
&lt;li>&lt;strong>status:&lt;/strong> censoring status 1=censored, 2=dead&lt;/li>
&lt;li>&lt;strong>sex:&lt;/strong> Male=1, Female=2&lt;/li>
&lt;/ul>
&lt;p>I’m ignoring the other variables for simplicity. You can explore those by your self.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:right;">
time
&lt;/th>
&lt;th style="text-align:right;">
status
&lt;/th>
&lt;th style="text-align:right;">
age
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:right;">
306
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
74
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
455
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
68
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
1010
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
56
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
210
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
57
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
883
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
60
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
1022
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
74
&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div id="b-the-alloauto-dataset" class="section level2">
&lt;h2>(B) The alloauto dataset&lt;/h2>
&lt;p>Consider the &lt;em>alloauto&lt;/em> dataset in the &lt;em>KMSurv&lt;/em> library in R. This contains 90 measurement on Leukemia treated patients with allogeneic and autologous Transplantation.&lt;/p>
&lt;p>&lt;span style="text-decoration:underline">&lt;em>Klein and Moeschberger (1997) Survival Analysis Techniques for Censored and truncated data, Springer. Kardaun Stat. Nederlandica 37 (1983), 103-126.&lt;/em>&lt;/span>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>time:&lt;/strong> &lt;em>Time to death or relapse months&lt;/em>&lt;/li>
&lt;li>&lt;span class="math inline">\(\mathbf{\delta}\)&lt;/span>: 0= alive without replace, 1 = dead or relapse&lt;/li>
&lt;li>&lt;strong>Event:&lt;/strong> 0= alive(Censored), 1= dead(Event)&lt;/li>
&lt;li>&lt;strong>type:&lt;/strong> Transplant, 1=allogeneic, 2=autologous&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:right;">
time
&lt;/th>
&lt;th style="text-align:right;">
type
&lt;/th>
&lt;th style="text-align:right;">
delta
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:right;">
0.030
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
0.493
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
0.855
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
1.184
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
1.283
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
1.480
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;/div>
&lt;div id="some-basic-definations-which-are-used-in-survival-studies" class="section level1">
&lt;h1>Some Basic Definations which are used in Survival Studies:&lt;/h1>
&lt;p>Let, &lt;span class="math inline">\(T\)&lt;/span> be the failure time or survival time or failure time, which is non negative valued discrete random variable &amp;amp; &lt;span class="math inline">\((t_1,t_2,t_3,...,t_n)\)&lt;/span> are the &lt;span class="math inline">\(n\)&lt;/span> number of observations.&lt;/p>
&lt;div id="cumulative-distribution" class="section level3">
&lt;h3>Cumulative Distribution&lt;/h3>
&lt;p>&lt;span class="math display">\[F(t)=P(T\leq t)\]&lt;/span>&lt;/p>
&lt;/div>
&lt;div id="survival-function" class="section level3">
&lt;h3>Survival Function&lt;/h3>
&lt;p>The survival function, is the probability an individual survives (or, the probability that the event of interest does not occur) up to and including time &lt;span class="math inline">\(t\)&lt;/span>. It’s the probability that the event (e.g., death) hasn’t occured yet. It looks like this, where &lt;span class="math inline">\(T\)&lt;/span> is the time of death, and &lt;span class="math inline">\(P(T&amp;gt;t)\)&lt;/span> is the probability that the time of death is greater than some time &lt;span class="math inline">\(t\)&lt;/span>.&lt;/p>
&lt;p>So, &lt;span class="math display">\[\text{Survival Function}=S(t)= P(T&amp;gt;t)= 1-F(t)\]&lt;/span>&lt;/p>
&lt;p>&lt;strong>Characteristics of&lt;/strong> &lt;span class="math inline">\(S(t)\)&lt;/span>:&lt;/p>
&lt;ul>
&lt;li>&lt;span class="math inline">\(S(t)=1\:\: \text{,if}\:\;t&amp;lt;0\)&lt;/span>&lt;/li>
&lt;li>&lt;span class="math inline">\(S(\infty)=\lim_{t \to \infty} S(t)=0\)&lt;/span>&lt;/li>
&lt;li>&lt;span class="math inline">\(S(t)\: \text{is non increasing in}\:t\)&lt;/span>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="failure-rate-or-hazard-rate" class="section level3">
&lt;h3>Failure rate or Hazard rate&lt;/h3>
&lt;p>Failure rate or hazard rate for an item at the time point &lt;span class="math inline">\(t\)&lt;/span>; usually denoted by &lt;span class="math inline">\(\gamma(t)\)&lt;/span> or &lt;span class="math inline">\(r(t)\)&lt;/span> or &lt;span class="math inline">\(h(t)\)&lt;/span>. Infact, it is an instantaneous probability rate that an item functioning till the time point &lt;span class="math inline">\(t\)&lt;/span> will fail at tat instant.&lt;/p>
&lt;p>&lt;span class="math display">\[\gamma(t)/h(t)=lim_{t \to \infty} \frac{P(t \leq T&amp;lt; t+{\Delta}t){\mid}T{\geq}t}{{\Delta}t}\]&lt;/span>&lt;/p>
&lt;p>&lt;strong>This hazard function&lt;/strong>&lt;span class="math inline">\(\{h(t)\}\)&lt;/span> &lt;strong>can be written in term of&lt;/strong> &lt;span style="text-decoration:underline">&lt;em>Cumulative distribution function&lt;/em>&lt;/span> &lt;strong>&amp;amp;&lt;/strong> &lt;span style="text-decoration:underline">&lt;em>Survival function&lt;/em>&lt;/span>:&lt;/p>
&lt;p>&lt;span class="math display">\[\gamma(t)/h(t)= \frac{\frac{d}{dt}\{F(t)\}}{1-F(t)}= \frac{-\frac{d}{dt}\{S(t)\}}{S(t)}\]&lt;/span>&lt;/p>
&lt;div id="types-of-hazard" class="section level4">
&lt;h4>Types of Hazard&lt;/h4>
&lt;p>The hazard function may be increase, decrease, remain constant or indicate more complicated processes.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="left">Hazard Nane&lt;/th>
&lt;th align="center">Example&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="left">Increasing Hazard(IFR)&lt;/td>
&lt;td align="center">Patients with acute leukemia who do not respond to treatment have an increasing hazard.&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Decreasing Hazard(DFR)&lt;/td>
&lt;td align="center">Risk of soldiers, wounded by bullets who undergo survey, The main danger is the operation itself and this danger decreases if the surgery is successful.&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Constant Hazard&lt;/td>
&lt;td align="center">The risk of healthy persons between 18 to 40 years of age whose main risk of death are accidents.&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Bathtub curve&lt;/td>
&lt;td align="center">Describes the process of human life. During an initial period, the risk is high(&lt;em>high Infant Mortality&lt;/em>). Subsequently, &lt;span class="math inline">\(\gamma(t)\)&lt;/span> stays approximately constant until a certain time, after which it increases because of were-out failures.&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Increasing &amp;amp; Decreasing Hazard&lt;/td>
&lt;td align="center">Patients with tuberculosis have risks that increase initially, then decrease after treatment.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;img src="WhatsApp%20Image%202022-05-07%20at%209.21.42%20AM.jpeg" />&lt;/p>
&lt;p>The &lt;strong>Kaplan-Meier&lt;/strong> curve illustrates the survival function. It’s a step function illustrating the cumulative survival probability over time. The curve is horizontal over periods where no event occurs, then drops vertically corresponding to a change in the survival function at each time an event occurs.&lt;/p>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;div id="what-is-censoring" class="section level1">
&lt;h1>What is Censoring ?&lt;/h1>
&lt;p>Censoring is a type of missing data problem unique to survival analysis. This happens when you track the sample/subject through the end of the study and the event never occurs. This could also happen due to the sample/subject dropping out of the study for reasons other than death, or some other loss to followup. The sample is censored in that you only know that the individual survived up to the loss to followup, but you don’t know anything about survival after that.&lt;/p>
&lt;p>&lt;img src="WhatsApp%20Image%202022-05-07%20at%209.21.41%20AM.jpeg" />&lt;/p>
&lt;p>Depending on the directions from which incompleteness in the observations come, censoring is of three types:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Right Censoring&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Left Censoring&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Interval Censoring&lt;/strong>&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="left">Type of Censoring&lt;/th>
&lt;th align="center">Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="left">Right Censoring&lt;/td>
&lt;td align="center">Here the lifetime of an item is followed until some time at which the event (i.e., &lt;em>failure&lt;/em> or &lt;em>death&lt;/em>) is yet to occur; but the event takes no farther part in the study after the time. Example: Measuring the survival years of a patient with lung cancer; but he died in a car accident after &lt;span class="math inline">\(t\)&lt;/span> years.&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Left Censoring&lt;/td>
&lt;td align="center">This occurs when the event of interest has already taken place at the time of observation; but the exact time of occurence of the event is not known. Example: Infection with a sexually transmitted like HIV/AIDS.&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Interval Censoring&lt;/td>
&lt;td align="center">It reflects uncertainty as to the exact times the units failed within an interval. This type of data frequently comes from tests or situations where the objects of interest are not constantly monitored.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div id="censored-survival-data" class="section level1">
&lt;h1>Censored survival data&lt;/h1>
&lt;p>Let’s create a toy example for survival data in R.&lt;/p>
&lt;pre class="r">&lt;code>my.data = data.frame(id=c(1:13), # Patient id
time=c(3,2,3,5,1,0.5,4.5,3.3,2,3.6,1.4,5.0,4.7), # Study years
Status=factor(c(&amp;quot;E&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;E&amp;quot;,&amp;quot;E&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;E&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;E&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;C&amp;quot;)) # Censored or Event status
)
head(my.data)
# Visualizing the Data
SurvPlot(time = my.data$time,
status = my.data$Status,
C=&amp;quot;C&amp;quot;,E=&amp;quot;E&amp;quot;,
text.adjs = 0.7,point.cex = 3,
title = &amp;quot;Survival Plot for My Data&amp;quot;,
legend.posi = &amp;quot;bottomright&amp;quot;) # This is my created function &lt;/code>&lt;/pre>
&lt;pre>&lt;code>## id time Status
## 1 1 3.0 E
## 2 2 2.0 C
## 3 3 3.0 C
## 4 4 5.0 E
## 5 5 1.0 E
## 6 6 0.5 C&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-4-1.png" width="672" />&lt;/p>
&lt;p>In this example, how would we compute the proportion who are event-free at 4 years ?&lt;/p>
&lt;p>Subjects 5,7,12 &amp;amp; 13 were &lt;strong>event-free&lt;/strong> at 4 years. Subjects 7 &amp;amp; 12 had the &lt;strong>event before 4 years&lt;/strong>. Subjects 1,2,5,6,8,9 and 11 were &lt;strong>censored before 4 years&lt;/strong>, so we don’t know whether they had event or not by 4 years- how do we incorporate these subjects into our estimate?&lt;/p>
&lt;/div>
&lt;div id="distribution-of-follow-up-time" class="section level1">
&lt;h1>Distribution of follow-up time&lt;/h1>
&lt;p>Censored subjects still provide information so must be appropriately included in the analysis. Distribution of follow-up times is skewed, and may differ between censored patients and those with events. Follow-up times always positive.&lt;/p>
&lt;p>Let’s draw the Histogram and density plot(using Kernel density estimation technique) for &lt;strong>Censored&lt;/strong> and &lt;strong>Event&lt;/strong> from the above toy example.&lt;/p>
&lt;pre class="r">&lt;code># Histogram
pp=function(...){
with(subset(my.data,Status==&amp;quot;C&amp;quot;),hist(time,col= adjustcolor(&amp;quot;red&amp;quot;, alpha.f = 0.20)))
par(new=T)
with(subset(my.data,Status==&amp;quot;E&amp;quot;),hist(time,col=adjustcolor(&amp;quot;blue&amp;quot;, alpha.f = 0.20),axes = F))
}
par(mar=c(5.1,4.1,4.1,7))
pp()
legend(x=5.3,y=3,
legend = c(&amp;quot;Censor&amp;quot;,&amp;quot;Event&amp;quot;),
fill=c(adjustcolor(&amp;quot;red&amp;quot;, alpha.f = 0.20),
adjustcolor(&amp;quot;blue&amp;quot;, alpha.f = 0.20)),
title = &amp;quot;Survival&amp;quot;,
xpd=T)
par(mar=c(5.1,4.1,4.1,2.1),xpd=NA)
# Density Plot
density.plot=function(...){
cens=density(my.data$time[my.data$Status==&amp;quot;C&amp;quot;]) # fitting Kernel density for Censored data
event=density(my.data$time[my.data$Status==&amp;quot;E&amp;quot;]) # fitting Kernel density for Event data
plot(cens,main=&amp;quot;&amp;quot;,xlab = &amp;quot;&amp;quot;,ylab = &amp;quot;&amp;quot;,...)
par(new=T)
polygon(cens,col = adjustcolor(&amp;quot;red&amp;quot;, alpha.f = 0.20))
par(new=T)
plot(event,main = &amp;quot;Density Plots of time for Censored &amp;amp; Event&amp;quot;,
xlab=&amp;quot;time&amp;quot;,
ylab=&amp;quot;Frequency&amp;quot;,
axes=F)
par(new=T)
polygon(event,col=adjustcolor(&amp;quot;blue&amp;quot;, alpha.f = 0.20))
}
par(mar=c(5.1,4.1,4.1,7))
density.plot()
legend(x=8.2,y=0.25,
legend = c(&amp;quot;Censor&amp;quot;,&amp;quot;Event&amp;quot;),
fill=c(adjustcolor(&amp;quot;red&amp;quot;, alpha.f = 0.20),
adjustcolor(&amp;quot;blue&amp;quot;, alpha.f = 0.20)),
title = &amp;quot;Survival&amp;quot;,
xpd=T)
par(mar=c(5.1,4.1,4.1,2.1),xpd=NA)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-6-1.png" width="672" />&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-6-2.png" width="672" />&lt;/p>
&lt;p>Note that, here I’ve used the &lt;em>yarrr&lt;/em> package for reducing the density of the R base colors.&lt;/p>
&lt;/div>
&lt;div id="components-of-survival-data" class="section level1">
&lt;h1>Components of survival data&lt;/h1>
&lt;p>For subject &lt;span class="math inline">\(i\)&lt;/span>:&lt;/p>
&lt;ul>
&lt;li>Event time &lt;span class="math inline">\(T_i\)&lt;/span>&lt;/li>
&lt;li>Censoring time &lt;span class="math inline">\(C_i\)&lt;/span>&lt;/li>
&lt;li>Event indicator &lt;span class="math inline">\(\delta_i\)&lt;/span>:
&lt;ul>
&lt;li>1 if event observed (i.e., &lt;span class="math inline">\(T_i\le C_i\)&lt;/span>)&lt;/li>
&lt;li>0 if censored (i.e., &lt;span class="math inline">\(T_i&amp;gt; C_i\)&lt;/span>)&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>Observed time &lt;span class="math inline">\(Y_i=min(T_i,C_i)\)&lt;/span>&lt;/li>
&lt;/ul>
&lt;p>The observed times and an event indicator are provided in the &lt;em>lung&lt;/em> data.&lt;/p>
&lt;ul>
&lt;li>time: Survival time in days &lt;span class="math inline">\((Y_i)\)&lt;/span>&lt;/li>
&lt;li>status: censoring status 1=censored, 2=dead &lt;span class="math inline">\((\delta_i)\)&lt;/span>&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:right;">
inst
&lt;/th>
&lt;th style="text-align:right;">
time
&lt;/th>
&lt;th style="text-align:right;">
status
&lt;/th>
&lt;th style="text-align:right;">
age
&lt;/th>
&lt;th style="text-align:right;">
sex
&lt;/th>
&lt;th style="text-align:right;">
ph.ecog
&lt;/th>
&lt;th style="text-align:right;">
ph.karno
&lt;/th>
&lt;th style="text-align:right;">
pat.karno
&lt;/th>
&lt;th style="text-align:right;">
meal.cal
&lt;/th>
&lt;th style="text-align:right;">
wt.loss
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:right;">
3
&lt;/td>
&lt;td style="text-align:right;">
306
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
74
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
90
&lt;/td>
&lt;td style="text-align:right;">
100
&lt;/td>
&lt;td style="text-align:right;">
1175
&lt;/td>
&lt;td style="text-align:right;">
NA
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
3
&lt;/td>
&lt;td style="text-align:right;">
455
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
68
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;td style="text-align:right;">
90
&lt;/td>
&lt;td style="text-align:right;">
90
&lt;/td>
&lt;td style="text-align:right;">
1225
&lt;/td>
&lt;td style="text-align:right;">
15
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
3
&lt;/td>
&lt;td style="text-align:right;">
1010
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
56
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;td style="text-align:right;">
90
&lt;/td>
&lt;td style="text-align:right;">
90
&lt;/td>
&lt;td style="text-align:right;">
NA
&lt;/td>
&lt;td style="text-align:right;">
15
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
5
&lt;/td>
&lt;td style="text-align:right;">
210
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
57
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
90
&lt;/td>
&lt;td style="text-align:right;">
60
&lt;/td>
&lt;td style="text-align:right;">
1150
&lt;/td>
&lt;td style="text-align:right;">
11
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
883
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
60
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;td style="text-align:right;">
100
&lt;/td>
&lt;td style="text-align:right;">
90
&lt;/td>
&lt;td style="text-align:right;">
NA
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
12
&lt;/td>
&lt;td style="text-align:right;">
1022
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
74
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
50
&lt;/td>
&lt;td style="text-align:right;">
80
&lt;/td>
&lt;td style="text-align:right;">
513
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div id="dealing-with-dates-in-r" class="section level1">
&lt;h1>Dealing with dates in R&lt;/h1>
&lt;p>Data will often come with start and end dates rather than pre-calculated survival times. The first step is to make sure these are formatted as dates in R.&lt;/p>
&lt;p>Let’s create a small example dataset with &lt;em>‘start.date’&lt;/em> for surgery date and &lt;em>‘last.followup.date’&lt;/em> for the last follow-up date.&lt;/p>
&lt;pre class="r">&lt;code>date_ex=data.frame(start.date=c(&amp;quot;2007-06-22&amp;quot;,&amp;quot;2004-02-12&amp;quot;,&amp;quot;2010-11-03&amp;quot;),
last.followup.date=c(&amp;quot;2017-04-15&amp;quot;,&amp;quot;2018-07-04&amp;quot;,&amp;quot;2016-10-31&amp;quot;))
str(date_ex)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## &amp;#39;data.frame&amp;#39;: 3 obs. of 2 variables:
## $ start.date : chr &amp;quot;2007-06-22&amp;quot; &amp;quot;2004-02-12&amp;quot; &amp;quot;2010-11-03&amp;quot;
## $ last.followup.date: chr &amp;quot;2017-04-15&amp;quot; &amp;quot;2018-07-04&amp;quot; &amp;quot;2016-10-31&amp;quot;&lt;/code>&lt;/pre>
&lt;p>We see these are both character variables, which will often be the case, but we need them to be formatted as dates. For that, here I’m showing two methods; one is using the R base &lt;em>as.Date()&lt;/em> function and another is using &lt;em>lubridate&lt;/em> package.&lt;/p>
&lt;div id="formating-dates" class="section level3">
&lt;h3>Formating dates&lt;/h3>
&lt;pre class="r">&lt;code>#####################################
## ##
##--Using base function as.Date()--##
## ##
#####################################
date_ex$start.date=as.Date(date_ex$start.date,format = &amp;quot;%Y-%m-%d&amp;quot;)
date_ex$last.followup.date=as.Date(date_ex$last.followup.date,format = &amp;quot;%Y-%m-%d&amp;quot;)
date_ex
str(date_ex)
##################################################
## ##
##--Using ymd() func. inside lubridate package--##
## ##
##################################################
library(lubridate)
date_ex$start.date=ymd(date_ex$start.date)
date_ex$last.followup.date=ymd(date_ex$last.followup.date)
date_ex
str(date_ex)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## start.date last.followup.date
## 1 2007-06-22 2017-04-15
## 2 2004-02-12 2018-07-04
## 3 2010-11-03 2016-10-31&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## &amp;#39;data.frame&amp;#39;: 3 obs. of 2 variables:
## $ start.date : Date, format: &amp;quot;2007-06-22&amp;quot; &amp;quot;2004-02-12&amp;quot; ...
## $ last.followup.date: Date, format: &amp;quot;2017-04-15&amp;quot; &amp;quot;2018-07-04&amp;quot; ...&lt;/code>&lt;/pre>
&lt;p>Note that in base R the format must include the separator as well as the symbol.e.g., if your date is in format &lt;em>m/d/Y&lt;/em> then you would need &lt;em>format= “%m/%d/%Y”&lt;/em>, on the other hand for &lt;em>ymd()&lt;/em> function in the &lt;em>lubridate&lt;/em> package, the separators do not need to be specified.&lt;/p>
&lt;/div>
&lt;div id="calculating-survival-times" class="section level3">
&lt;h3>Calculating Survival Times&lt;/h3>
&lt;p>Now to calculate the survival times, we need to calculate the difference between the start time &amp;amp; end time. So for this, in base R there is a function called &lt;em>difftime()&lt;/em> which gives the number of days between two dates. Use as.numeric() function to convert the the differences into a numeric value. Finally to convert it into years divide it by 365.25, the average number of days in a year.&lt;/p>
&lt;p>On the other hand, using the &lt;em>lubridate&lt;/em> package, the operator &lt;em>%–%&lt;/em> designates a time interval, which i then converted to the number of elapsed seconds using &lt;em>as.duration()&lt;/em> and finally converted to years by dividing by &lt;em>dyears(1)&lt;/em>, which gives the number of seconds in a year.&lt;/p>
&lt;pre class="r">&lt;code>#####################################
## ##
##--Using base function as.Date()--##
## ##
#####################################
date_ex$time=round(as.numeric(difftime(date_ex$last.followup.date,
date_ex$start.date,
units = &amp;quot;days&amp;quot;))/365.25,2)
date_ex
###############################
## ##
##--Using lubridate package--##
## ##
###############################
date_ex$time=as.duration(date_ex$start.date %--% date_ex$last.followup.date)/dyears(1)
date_ex&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## start.date last.followup.date time
## 1 2007-06-22 2017-04-15 9.82
## 2 2004-02-12 2018-07-04 14.39
## 3 2010-11-03 2016-10-31 5.99&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div id="event-indicator-and-creating-survival-objects" class="section level1">
&lt;h1>Event indicator and Creating survival objects&lt;/h1>
&lt;p>In the &lt;strong>Components of survival data&lt;/strong> section I mentioned the event indicator:&lt;/p>
&lt;p>Event indicator &lt;span class="math inline">\(\delta_i\)&lt;/span>:&lt;/p>
&lt;ul>
&lt;li>1 if event observed (i.e., &lt;span class="math inline">\(T_i\le C_i\)&lt;/span>)&lt;/li>
&lt;li>0 if censored (i.e., &lt;span class="math inline">\(T_i&amp;gt; C_i\)&lt;/span>)&lt;/li>
&lt;/ul>
&lt;p>In R &lt;em>Surv()&lt;/em> function inside the survival package creates a survival object. There will be one entry for each subject that is the survival time, which is followed by a &lt;strong>‘+’&lt;/strong> if the subject was censored. To create the survival object, inside the &lt;em>Surv()&lt;/em> function, you need to give the time variable and the status variable which indicates that the subject was censored or not.&lt;/p>
&lt;p>Let’s look at the first 10 observations of the survival objects for the &lt;em>lung&lt;/em> dataset:&lt;/p>
&lt;pre class="r">&lt;code>library(survival)
Surv(lung$time,lung$status)[1:10]&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Warning: package &amp;#39;survival&amp;#39; was built under R version 4.1.3&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 306 455 1010+ 210 883 1022+ 310 361 218 166&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="estimating-survival-curves-and-survival-probabilities-with-kaplan-meier-method" class="section level1">
&lt;h1>Estimating Survival curves and Survival probabilities with Kaplan-Meier method&lt;/h1>
&lt;p>Now, Our interest is to estimate what is the survival probability at a certain time say &lt;span class="math inline">\(t\)&lt;/span>. Survival probability at a certain time, &lt;span class="math inline">\(S(t)\)&lt;/span>, is a conditional probability of surviving beyond that time, given that an individual has survived just prior to that time.
There are various techniques are available, one of them is &lt;strong>Product Limit estimation/Kaplan-Meier estimation&lt;/strong>.&lt;/p>
&lt;div id="kaplan-meier-estimator-product-limitpl-estimator" class="section level4">
&lt;h4>Kaplan-Meier Estimator/ Product Limit(PL) Estimator&lt;/h4>
&lt;p>Let &lt;span class="math inline">\(t_1,t_2,...,t_n\)&lt;/span> be uncensored sample observations on failure time. Then, a non-parametric estimate of survival function &lt;span class="math inline">\(S(t)\)&lt;/span> at the time point &lt;span class="math inline">\(t\)&lt;/span> is given by,&lt;/p>
&lt;p>&lt;span class="math display">\[R_n(t)=\frac{\# \text{Observations}&amp;gt;t}{n}\]&lt;/span> &lt;span class="math display">\[R_n(t)=\frac{\# T_i&amp;gt;t}{n}\:\: \text{where}\:T_i\:\forall i=1(1)n \: \: \text{are uncensored R.V.&amp;#39;s}\]&lt;/span>
This is basically the &lt;strong>complementary empirical distribution function&lt;/strong> at time &lt;span class="math inline">\(t\)&lt;/span>.&lt;/p>
&lt;p>Note that, &lt;span class="math inline">\(R_n(t)\)&lt;/span> is an UMVUE(uniformly minimum variance unbiased estimator), consistent, efficient estimator of &lt;span class="math inline">\(S(t)\)&lt;/span>.&lt;/p>
&lt;p>But, usually we cannot expect uncensored failure data due to many practical limitations, that’s why we need some modifications. This modified estimator is called Product limit estimator/Kaplan-Meier estimator.&lt;/p>
&lt;p>Let there be &lt;span class="math inline">\(n\)&lt;/span> items and &lt;span class="math inline">\(k(\leq n)\)&lt;/span> distinct failure times &lt;span class="math inline">\(t_1 \leq t_2 \leq .... \leq t_k\)&lt;/span> observed.&lt;/p>
&lt;p>Let, &lt;span class="math display">\[d_j=\# \text{failures at time}\: t_j \:\:\: ,\forall j=1(1)k\]&lt;/span>
&lt;span class="math display">\[n_j=\# \text{items at risk of failing at}\: t_j \\ \:\:\:\:\;\;\;\;\;\;\;\:\:\:\:\:\:\:\;\;\;\;\;\;\;\:\:\:\:\:\:\:\;\;\;\;\;\;\;\:\:\:\:\:\:\:\;\;\;\;\;\;\;\:\:\: =\# \text{items that are functioning and uncensored just prior to}\: t_j\]&lt;/span>
Then, the Kaplan-Meier estimator is defined as: &lt;span class="math display">\[\hat{R_n(t)}={\prod}_{j:t_j&amp;lt;t} \frac{n_j-d_j}{n_j}\]&lt;/span>
where, &lt;span class="math inline">\(n_{j+1}=n_j-d_j-c_j\:\:\:\:, c_j=\# \text{items censored at }t_j\)&lt;/span>&lt;/p>
&lt;ul>
&lt;li>It can be shown that, &lt;span class="math inline">\(\hat{R_n(t)}\)&lt;/span> is a non-parametric MLE of the survival function &lt;span class="math inline">\(S(t)\)&lt;/span>.&lt;/li>
&lt;li>So, &lt;span class="math inline">\(E(\hat{R_n(t)})=S(t)\)&lt;/span> at a particular time &lt;span class="math inline">\(t\)&lt;/span>.&lt;/li>
&lt;li>The estimated asymptotic variance of Kaplan-Meier estimator (by Greenwood’s formula) is &lt;span class="math display">\[\hat{V}(\hat{R_n(t)}) \approx (\frac{\delta \hat{R_n(t)}}{\delta\:log \hat{R_n(t)}})^2. \hat{V}(log \hat{R_n(t)}) \\ \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: =(\hat{R_n(t)})^2 .{\sum}_{j:t_j&amp;lt;t} \frac{d_j}{n_j(n_j-d_j)}\]&lt;/span>&lt;/li>
&lt;/ul>
&lt;p>Now, in R the &lt;em>survfit&lt;/em> function creates survival curves based on a formula. Let’s generate the overall survival curve for the entire cohort, assign it to object &lt;em>f1&lt;/em>, and look at the summary.&lt;/p>
&lt;pre class="r">&lt;code>f1=survfit(Surv(time, status)~1,data=lung)
f1
summary(f1)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Call: survfit(formula = Surv(time, status) ~ 1, data = lung)
##
## n events median 0.95LCL 0.95UCL
## [1,] 228 165 310 285 363&lt;/code>&lt;/pre>
&lt;p>These tables show a row for each time point where either the event occured or a sample was censored. It shows the number at risk (number still remaining), and the cumulative survival at that instant.&lt;/p>
&lt;p>For more details about the outputs of &lt;em>survfit()&lt;/em>, type &lt;em>?summary.survfit&lt;/em> or you can simply run &lt;em>names(fit1)&lt;/em> to know all the output features.&lt;/p>
&lt;p>Here we’ve created a simple survival curve that doesn’t consider any different groupings, so we’ve specified just an intercept (e.g., &lt;em>~1&lt;/em>) in the formula that &lt;em>survfit&lt;/em> expects. It is similar to how we specify data for linear models with &lt;em>lm()&lt;/em>, we use the &lt;em>data=&lt;/em> argument to specify which data we’re using.&lt;/p>
&lt;p>You can give the summary() function an option for what times you want to show in the results.
let’s create a sequence of times from the lung dataset and for those time points we’ll see the &lt;em>survfit&lt;/em> results.&lt;/p>
&lt;pre class="r">&lt;code># checking the range of the time variable
range(lung$time)
# creating a time sequence
seq(0,1100,100)
# visualizing the summary of &amp;#39;f1&amp;#39; for the above time sequence
summary(f1, times = seq(0,1100,100))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 5 1022&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0 100 200 300 400 500 600 700 800 900 1000 1100&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Call: survfit(formula = Surv(time, status) ~ 1, data = lung)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 228 0 1.0000 0.0000 1.0000 1.000
## 100 196 31 0.8640 0.0227 0.8206 0.910
## 200 144 41 0.6803 0.0311 0.6219 0.744
## 300 92 29 0.5306 0.0346 0.4669 0.603
## 400 57 25 0.3768 0.0358 0.3128 0.454
## 500 41 12 0.2933 0.0351 0.2320 0.371
## 600 24 10 0.2136 0.0335 0.1571 0.290
## 700 16 8 0.1424 0.0303 0.0938 0.216
## 800 8 7 0.0783 0.0246 0.0423 0.145
## 900 3 2 0.0503 0.0228 0.0207 0.123
## 1000 2 0 0.0503 0.0228 0.0207 0.123&lt;/code>&lt;/pre>
&lt;p>What’s more interesting though is if we model something besides just an intercept. Let’s fit survival curves separately by sex.&lt;/p>
&lt;pre class="r">&lt;code>f2=survfit(Surv(time, status)~sex,data=lung)
f2&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Call: survfit(formula = Surv(time, status) ~ sex, data = lung)
##
## n events median 0.95LCL 0.95UCL
## sex=1 138 112 270 212 310
## sex=2 90 53 426 348 550&lt;/code>&lt;/pre>
&lt;p>We can use the above time sequence vector with a summary call on fit2 to get life tables at those intervals separately for both males (1) and females (2). From these tables we can start to see that males tend to have worse survival than females.&lt;/p>
&lt;pre class="r">&lt;code>summary(f2,times = seq(0,1100,100))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Call: survfit(formula = Surv(time, status) ~ sex, data = lung)
##
## sex=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 138 0 1.0000 0.0000 1.0000 1.000
## 100 114 24 0.8261 0.0323 0.7652 0.892
## 200 78 30 0.6073 0.0417 0.5309 0.695
## 300 49 20 0.4411 0.0439 0.3629 0.536
## 400 31 15 0.2977 0.0425 0.2250 0.394
## 500 20 7 0.2232 0.0402 0.1569 0.318
## 600 13 7 0.1451 0.0353 0.0900 0.234
## 700 8 5 0.0893 0.0293 0.0470 0.170
## 800 6 2 0.0670 0.0259 0.0314 0.143
## 900 2 2 0.0357 0.0216 0.0109 0.117
## 1000 2 0 0.0357 0.0216 0.0109 0.117
##
## sex=2
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 90 0 1.0000 0.0000 1.0000 1.000
## 100 82 7 0.9221 0.0283 0.8683 0.979
## 200 66 11 0.7946 0.0432 0.7142 0.884
## 300 43 9 0.6742 0.0523 0.5791 0.785
## 400 26 10 0.5089 0.0603 0.4035 0.642
## 500 21 5 0.4110 0.0626 0.3050 0.554
## 600 11 3 0.3433 0.0634 0.2390 0.493
## 700 8 3 0.2496 0.0652 0.1496 0.417
## 800 2 5 0.0832 0.0499 0.0257 0.270
## 900 1 0 0.0832 0.0499 0.0257 0.270&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Note:&lt;/strong> You can see that the outputs of the summary is a &lt;em>‘list’&lt;/em>. To convert the summary output into a &lt;strong>data.frame&lt;/strong>, one easiest way to do this is to use the &lt;em>tidy&lt;/em> function from &lt;em>broom&lt;/em> package.&lt;/p>
&lt;/div>
&lt;/div>
&lt;div id="kaplan-meier-plots" class="section level1">
&lt;h1>Kaplan-Meier Plots&lt;/h1>
&lt;p>Now we plot the &lt;em>survfit&lt;/em> object in base R to get the Kaplan-Meier plot.&lt;/p>
&lt;pre class="r">&lt;code>##-- Survival plot for overall data --##
plot(f1,
xlab = &amp;quot;Days&amp;quot;,
ylab=&amp;quot;Survival Probability&amp;quot;,
main=&amp;quot;Overall Survival Probability&amp;quot;)
##-- Survival plot grouped by Sex --##
plot(f2,
col=c(1,2),
xlab = &amp;quot;Days&amp;quot;,
ylab=&amp;quot;Survival Probability&amp;quot;,
main=&amp;quot;Survival Probability ptot grouped by Sex&amp;quot;,
lwd=2)
legend(&amp;quot;top&amp;quot;,legend=c(&amp;quot;Male&amp;quot;,&amp;quot;Female&amp;quot;),col = c(1,2),lty = c(1,2),lwd=2,box.col = &amp;quot;white&amp;quot;,horiz = T,title = &amp;quot;Sex&amp;quot;)
box()&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-24-1.png" width="672" />&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-24-2.png" width="672" />&lt;/p>
&lt;ul>
&lt;li>&lt;p>The default plot in base R shows the step function(solid line) with associated confidende intervals (dotted lines)&lt;/p>&lt;/li>
&lt;li>&lt;p>Horizontal lines represent survival duration for the interval.&lt;/p>&lt;/li>
&lt;li>&lt;p>The height of vertical lines show the change in cumulative probability.&lt;/p>&lt;/li>
&lt;li>&lt;p>Censored observations, indicated by tick marks, reduce the cumulative survival between intervals.(the tick marks for censored patients are not shown by default, but could be added using the option &lt;em>mark.time = TRUE&lt;/em>)&lt;/p>&lt;/li>
&lt;li>&lt;p>When there are two or more survival curves, by default the plot ignore the confidence intervals.&lt;/p>&lt;/li>
&lt;li>&lt;p>To plot survival curve in more convenient way, use the &lt;em>ggsurvplot&lt;/em> function from the &lt;em>survminer&lt;/em> package.&lt;/p>&lt;/li>
&lt;li>&lt;p>But you can customize this base R survival plots according to your needs. For example :&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>summary_1=summary(f1) # Summary of f1
##-- Survival Probability plot --##
plot(summary_1$time,summary_1$surv,
type=&amp;quot;S&amp;quot;,
lwd=2,
xlab = &amp;quot;Days&amp;quot;,
ylab=&amp;quot;Survival Probability&amp;quot;,
main=&amp;quot;Overall Survival Probability&amp;quot;)
##-- Plotting confidence interval with shade --##
polygon(c(summary_1$time,rev(summary_1$time)),c(summary_1$upper,rev(summary_1$lower)),
col=gray(0.4,0.4),border = NA)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-26-1.png" width="672" />&lt;/p>
&lt;p>&lt;strong>Note:&lt;/strong> To plot the confidence interval with a shaded region, the trick is that use the &lt;em>polygon&lt;/em> function and you must provide 2 times the x coordinates in one vector, once in normal order and once in reverse order (with function &lt;em>rev&lt;/em>) and you must provide the y coordinates as a vector of the upper bounds followed by the lower bounds in reverse order.&lt;/p>
&lt;p>Similarly, you can plot this type of customized plot for the ‘f2’.&lt;/p>
&lt;pre class="r">&lt;code>summary_2=summary(f2)
##-- Informations that we need --##
# For Sex=1 (Male)
time_sex_1 = summary_2$time[summary_2$strata==&amp;quot;sex=1&amp;quot;] # Times for Sex=1
surv.prob_sex_1 = summary_2$surv[summary_2$strata==&amp;quot;sex=1&amp;quot;] # Survival Prob. for Sex=1
lower_sex_1 = summary_2$lower[summary_2$strata==&amp;quot;sex=1&amp;quot;] # lower Confidence level for Sex=1
upper_sex_1= summary_2$upper[summary_2$strata==&amp;quot;sex=1&amp;quot;] # upper Confidence level for Sex=1
# For Sex=2 (Female)
time_sex_2 = summary_2$time[summary_2$strata==&amp;quot;sex=2&amp;quot;] # Times for Sex=2
surv.prob_sex_2 = summary_2$surv[summary_2$strata==&amp;quot;sex=2&amp;quot;] # Survival Prob. for Sex=2
lower_sex_2 = summary_2$lower[summary_2$strata==&amp;quot;sex=2&amp;quot;] # lower Confidence level for Sex=2
upper_sex_2= summary_2$upper[summary_2$strata==&amp;quot;sex=2&amp;quot;] # upper Confidence level for Sex=2
##-- Plotting Survival Curve grouped by Sex --##
plot(time_sex_1,surv.prob_sex_1,
type=&amp;quot;S&amp;quot;,
lwd=2,
col= &amp;quot;blue&amp;quot;,
xlab = &amp;quot;Days&amp;quot;,
ylab=&amp;quot;Survival Probability&amp;quot;,
main=&amp;quot;Survival Probability Curve grouped by Gender&amp;quot;)
polygon(c(time_sex_1,rev(time_sex_1)),c(upper_sex_1,rev(lower_sex_1)),
col=adjustcolor(&amp;quot;blue&amp;quot;, alpha.f = 0.20),border = NA)
par(new=T)
plot(time_sex_2,surv.prob_sex_2,
type=&amp;quot;S&amp;quot;,
lwd=2,
col=&amp;quot;red&amp;quot;,
xlab = &amp;quot;&amp;quot;,
ylab=&amp;quot;&amp;quot;,
axes=FALSE)
polygon(c(time_sex_2,rev(time_sex_2)),c(upper_sex_2,rev(lower_sex_2)),
col=adjustcolor(&amp;quot;red&amp;quot;, alpha.f = 0.20),border = NA)
legend(&amp;quot;top&amp;quot;,legend=c(&amp;quot;Male&amp;quot;,&amp;quot;Female&amp;quot;),
fill=c(&amp;quot;blue&amp;quot;,&amp;quot;red&amp;quot;),
box.col = &amp;quot;white&amp;quot;,
horiz = T,
title = &amp;quot;Sex&amp;quot;)
box()&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-27-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="estimating-x-years-survival" class="section level1">
&lt;h1>Estimating x-years survival&lt;/h1>
&lt;p>One quantity often of interest in a survival analysis is the probability of surviving beyond a certain number&lt;span class="math inline">\((x)\)&lt;/span> of years.&lt;/p>
&lt;p>For example, to estimate the probability of surviving to 1 year, use &lt;em>summary&lt;/em> with the times argument (&lt;strong>Note:&lt;/strong> the &lt;em>time&lt;/em> variable in the &lt;em>lung&lt;/em> data is actually in days, so we need to use &lt;em>times=365.25&lt;/em>)&lt;/p>
&lt;pre class="r">&lt;code>summary(f1,times = 365.25)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Call: survfit(formula = Surv(time, status) ~ 1, data = lung)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 365 65 121 0.409 0.0358 0.345 0.486&lt;/code>&lt;/pre>
&lt;p>&lt;strong>We find the 1 year probability of survival in this study is 41%.&lt;/strong>&lt;/p>
&lt;p>Note that,&lt;/p>
&lt;ul>
&lt;li>&lt;p>n.risk = 65 = the number of subjects at risk at 1 year ,i.e.,number of subjects who are remaining in the study&lt;/p>&lt;/li>
&lt;li>&lt;p>n.event = 121 = it is the cumulative number of events that have occurred since the last time listed until 1 year&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="testing-of-survival-curves" class="section level1">
&lt;h1>Testing of survival curves&lt;/h1>
&lt;p>When there are more than 1 survival curves for &lt;span class="math inline">\(k\;(\text{where,}\:k\geq2)\)&lt;/span> number of groups, we need to perform a statistical significant test between those survival curves. As, &lt;span class="math inline">\(S(t)\)&lt;/span> is a probability function, so, &lt;strong>Log Rank test statistic&lt;/strong> is approximately distributed as a chi-square test statistic with d.f. 1.&lt;/p>
&lt;p>The hypothesis are:&lt;/p>
&lt;p>&lt;span class="math display">\[H_0: \text{In terms of survivability, there is no difference between the groups} \\ a.g.\\ H_1: \text{There is a survival differential between the groups.}\]&lt;/span>&lt;/p>
&lt;p>In R to do Log Rank test, there is a function called &lt;em>survdiff&lt;/em> under the &lt;em>survival&lt;/em> package, that we usually use.&lt;/p>
&lt;pre class="r">&lt;code>test= survdiff(Surv(time,status)~sex,data=lung)
test&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Call:
## survdiff(formula = Surv(time, status) ~ sex, data = lung)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 138 112 91.6 4.55 10.3
## sex=2 90 53 73.4 5.68 10.3
##
## Chisq= 10.3 on 1 degrees of freedom, p= 0.001&lt;/code>&lt;/pre>
&lt;p>The Chi-Squared test statistic is 10.3 with 1 degree of freedom and the corresponding p-value is 0.001. Since this p-value is less than 0.05, we reject the null hypothesis.&lt;/p>
&lt;p>In other words, we have sufficient evidence to say that there is a statistically significant difference in survival between the Male(sex=1) &amp;amp; Female(sex=2).&lt;/p>
&lt;p>To, extract the p-value from &lt;strong>survdiff&lt;/strong> we use the following trick&lt;/p>
&lt;pre class="r">&lt;code>p.val= 1 - pchisq(test$chisq, length(test$n) - 1)
round(p.val,3)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.001&lt;/code>&lt;/pre>
&lt;p>Or
there is the &lt;em>sdp&lt;/em> function in the &lt;em>ezfun&lt;/em> package, which you can install using &lt;em>devtools::install_github(“zabore/ezfun”)&lt;/em>. It returns a formulated p-value.&lt;/p>
&lt;/div>
&lt;div id="coxs-proportional-hazard-regression-model" class="section level1">
&lt;h1>Cox’s Proportional Hazard Regression model&lt;/h1>
&lt;p>Kaplan-Meier curves are good for visualizing differences in survival between two categorical groups, but they don’t work well for assessing the effect of quantitative variables like age, gene expression, leukocyte count, etc. Cox PH regression can assess the effect of both categorical and continuous variables, and can model the effect of multiple variables at once.&lt;/p>
&lt;p>Cox PH regression models the natural log of the hazard at time &lt;span class="math inline">\(t\)&lt;/span>, denoted &lt;span class="math inline">\(h(t)\)&lt;/span>, as a function of the baseline hazard &lt;span class="math inline">\(h_0(t)\)&lt;/span> (&lt;em>the hazard for an individual where all exposure variables are 0&lt;/em>) and multiple exposure variables &lt;span class="math inline">\(X_1,X_2,...,X_p\)&lt;/span>. The form of the Cox PH model is:&lt;/p>
&lt;p>&lt;span class="math display">\[ln(h(t))= ln(h_0(t))+\beta_1 X_1+\beta_2 X_2+......+\beta_p X_p\]&lt;/span>
If you exponentiate both sides of the equation, and limit the right hand side to just a single categorical exposure variable &lt;span class="math inline">\((X_1)\)&lt;/span> with two groups (&lt;span class="math inline">\(X_1=1\)&lt;/span> for exposed and &lt;span class="math inline">\(X_1=2\)&lt;/span> for unexposed), the equation becomes: &lt;span class="math display">\[h_1(t)=h_0(t) \times e^{\beta_1 x_1}\]&lt;/span>&lt;/p>
&lt;p>Rearranging that equation lets you estimate the &lt;strong>hazard ratio&lt;/strong>, comparing the exposed to the unexposed individuals at time &lt;span class="math inline">\(t\)&lt;/span>:
&lt;span class="math display">\[HR(t)=\frac{h_1(t)}{h_0(t)}=e^{\beta_1}\]&lt;/span>&lt;/p>
&lt;p>This model shows that the hazard ratio is &lt;span class="math inline">\(e^{\beta_1}\)&lt;/span>, and remains constant over time &lt;span class="math inline">\(t\)&lt;/span> (&lt;em>hence the name proportional hazards regression&lt;/em>). The &lt;span class="math inline">\(\beta\)&lt;/span> values are the regression coefficients that are estimated from the model, and represent the &lt;span class="math inline">\(log(\text{Hazard Ratio})\)&lt;/span> for each unit increase in the corresponding predictor variable. The interpretation of the hazards ratio depends on the measurement scale of the predictor variable, but in simple terms, a positive coefficient indicates worse survival and a negative coefficient indicates better survival for the variable in question.&lt;/p>
&lt;p>Note that, the model is a &lt;em>semi-parametric&lt;/em> because:&lt;/p>
&lt;ul>
&lt;li>Model involves some parameters &lt;span class="math inline">\(\beta\)&lt;/span>,&lt;/li>
&lt;li>Model does not depend on any specific life-distribution.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note:&lt;/strong> parametric regression models for survival outcomes are also available, but they won’t be addressed in this taining. You can read about it in &lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5233524/#:~:text=%20Parametric%20regression%20model%20for%20survival%20data%3A%20Weibull,%28%29%20function%20contained...%205%20Acknowledgements.%20%20More%20">here&lt;/a>.&lt;/p>
&lt;p>In R, to perform Cox regression, there is a function called &lt;em>coxph&lt;/em> under the &lt;em>survival&lt;/em> package. The &lt;em>coxph()&lt;/em> function uses the same syntax as &lt;em>lm()&lt;/em>, &lt;em>glm()&lt;/em>, etc. The response variable you create with &lt;em>Surv()&lt;/em> goes on the left hand side of the formula, specified with a &lt;em>~&lt;/em>. Explanatory variables go on the right side.&lt;/p>
&lt;p>Let’s go back to the lung cancer data and run a Cox regression on sex.&lt;/p>
&lt;pre class="r">&lt;code>Cox.fit=coxph(Surv(time,status)~sex,data=lung)
Cox.fit&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Call:
## coxph(formula = Surv(time, status) ~ sex, data = lung)
##
## coef exp(coef) se(coef) z p
## sex -0.5310 0.5880 0.1672 -3.176 0.00149
##
## Likelihood ratio test=10.63 on 1 df, p=0.001111
## n= 228, number of events= 165&lt;/code>&lt;/pre>
&lt;p>The &lt;em>exp(coef)&lt;/em> column contains &lt;span class="math inline">\(e^{\beta_1}\)&lt;/span>. This is the hazard ratio (in our case HR=0.59) – the multiplicative effect of that variable on the hazard rate (for each unit increase in that variable). So, for a categorical variable like sex, going from male (baseline) to female results in approximately ~40% reduction in hazard, i.e., around 0.6 times as many females are dying as males, at any given time. You could also flip the sign on the coef column, and take &lt;em>exp(0.531)&lt;/em>, which you can interpret as being male resulting in a 1.7-fold increase in hazard, or that males die ad approximately 1.7x the rate per unit time as females (females die at 0.588x the rate per unit time as males).&lt;/p>
&lt;p>Note that:&lt;/p>
&lt;ul>
&lt;li>HR=1: No effect&lt;/li>
&lt;li>HR&amp;gt;1: Increase in hazard&lt;/li>
&lt;li>HR&amp;lt;1: Reduction in hazard (protective)&lt;/li>
&lt;/ul>
&lt;p>You’ll also notice there’s a p-value on the &lt;em>sex&lt;/em> term, and a p-value on the &lt;em>overall model&lt;/em>. That 0.00111 p-value is really close to the p=0.00131 p-value we saw on the Kaplan-Meier plot. That’s because the KM plot is showing the log-rank test p-value. You can get this out of the Cox model with a call to &lt;em>summary(Cox.fit)&lt;/em>.&lt;/p>
&lt;/div>
&lt;div id="competing-risks" class="section level1">
&lt;h1>Competing Risks&lt;/h1>
&lt;div id="what-is-competing-event-and-competing-risk" class="section level4">
&lt;h4>What is Competing Event ? and Competing Risk ?&lt;/h4>
&lt;p>In standard survival data, subjects are supposed to experience only one type of event over follow-up, such as death from breast cancer. On the contrary, in real life, subjects can potentially experience more than one type of a certain event. For instance, if mortality is of research interest, then our observations – senior patients at an oncology department, could possibly die from heart attack or breast cancer, or even traffic accident. When only one of these different types of event can occur, we refers to these events as “competing events”, in a sense that they compete with each other to deliver the event of interest, and the occurrence of one type of event will prevent the occurrence of the others. As a result, we call the probability of these events as “competing risks”, in a sense that the probability of each competing event is somehow regulated by the other competing events, which has an interpretation suitable to describe the survival process determined by multiple types of events.&lt;/p>
&lt;p>To better understand the competing event scenario, consider the following examples:&lt;/p>
&lt;ul>
&lt;li>A patient can die from breast cancer or from stroke, but he cannot die from both;&lt;/li>
&lt;li>A breast cancer patient may die after surgery before they can develop hospital infection;&lt;/li>
&lt;li>A soldier may die during a combat or in a traffic accident.&lt;/li>
&lt;/ul>
&lt;p>In the examples above, there are more than one pathway that a subject can fail, but the failure, either death or infection, can only occur once for each subject (without considering recurring event). Therefore, the failures caused by different pathways are mutually exclusive and hence called competing events. Analysis of such data requires special considerations.&lt;/p>
&lt;/div>
&lt;div id="how-to-handel-this-type-of-situation" class="section level4">
&lt;h4>How to handel this type of situation ?&lt;/h4>
&lt;p>Competing risks implies that a subject can experience one of a set of different events or outcomes. In this case, 2 different types of hazard functions are of interest: the &lt;strong>cause-specific hazard function&lt;/strong> and the &lt;strong>subdistribution hazard function.&lt;/strong>&lt;/p>
&lt;p>The key components for competing risks are :&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Cumulative incidence function(CIF)&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Cause-specific hazard&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Subdistribution hazard&lt;/strong>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="cumulative-incidence-functioncif" class="section level4">
&lt;h4>Cumulative incidence function(CIF)&lt;/h4>
&lt;p>The cumulative incidence function gives the proportion of patients at time t who have died from cause k accounting for the fact that patients can die from other causes.&lt;/p>
&lt;p>define, &lt;span class="math display">\[S_t= \text{Number at risk at the end of period}\;t \\ E_t= \text{Number of primary events in period}\;t \\ A_t= \text{Number of competing events in period}\;t\]&lt;/span>&lt;/p>
&lt;p>&lt;span class="math display">\[P(E=t|E \geq t) \approx \frac{E_t}{E_t+A_t+S_t}\]&lt;/span>&lt;/p>
&lt;p>&lt;strong>Note:&lt;/strong> &lt;span class="math display">\[P(E \geq t+1|E \geq t) \neq 1- \frac{E_t}{E_t+A_t+S_t}\]&lt;/span>&lt;/p>
&lt;p>That means, &lt;span class="math inline">\(\color{blue}{\text{Kaplan-Meier estimator does not work!}}\)&lt;/span>&lt;/p>
&lt;p>So, the survival function : &lt;span class="math inline">\(\hat{S(t)}= {\prod}^{t}_{j=1} (1- \frac{A_j+E_j}{E_j+A_j+S_j})\)&lt;/span>&lt;/p>
&lt;p>and, CIF(of the primary events) : &lt;span class="math inline">\(\hat{C(t)}= {\sum}^t_{j=1} \frac{E_j}{E_j+A_j+S_j} \hat{S}(j-1)\)&lt;/span>&lt;/p>
&lt;/div>
&lt;div id="cause-specific-hazard" class="section level4">
&lt;h4>Cause-specific hazard&lt;/h4>
&lt;p>The cause-specific hazard, &lt;span class="math inline">\(h^{cs}_k(t)\)&lt;/span>, is the instantaneous risk of dying from a particular cause &lt;span class="math inline">\(k\)&lt;/span> given that the subject is still alive at time &lt;span class="math inline">\(t\)&lt;/span>.&lt;/p>
&lt;p>Mathematically, &lt;span class="math display">\[h^{cs}_k(t)={\lim}_{\Delta t \to 0} \frac{P(t \leq T&amp;lt; t+ \Delta t ,D=k|T \geq t)}{\Delta t}\]&lt;/span>&lt;/p>
&lt;/div>
&lt;div id="subdistribution-hazard" class="section level4">
&lt;h4>Subdistribution hazard&lt;/h4>
&lt;p>The subdistribution hazard, &lt;span class="math inline">\(h^{sd}_k(t)\)&lt;/span>, is the instantaneous risk of dying from a particular cause k given that the subject has not died from cause k.&lt;/p>
&lt;p>Mathematically, &lt;span class="math display">\[h^{sd}_k(t)={\lim}_{\Delta t \to 0} \frac{P(t \leq T&amp;lt; t+ \Delta t ,D=k\:|\:T \geq t \cup (T&amp;lt;t \cap K \neq k))}{\Delta t}\]&lt;/span>&lt;/p>
&lt;/div>
&lt;div id="a-bunch-off-additional-notes" class="section level4">
&lt;h4>A bunch off additional notes&lt;/h4>
&lt;ul>
&lt;li>When the events are independent(almost never true), cause-specific hazards is unbiased.&lt;/li>
&lt;li>when the events are dependent, a variety of results can be obtained depending on the setting.&lt;/li>
&lt;li>Cumulative incidence using K-M is always &lt;span class="math inline">\(\geq\)&lt;/span> Cumulative incidence using competing risks methods, so can only lead to an overestimate of the cumulative incidence, the amount of overestimation depends on event rates and dependence among events.&lt;/li>
&lt;li>To establish that a covariate is indeed acting on the event of interest, cause-specific hazards may be preferred for treatment or pronostic marker effect testing.&lt;/li>
&lt;/ul>
&lt;p>In R, the primary package for use in competing risks analysis is &lt;strong>cmprsk&lt;/strong>.&lt;/p>
&lt;pre class="r">&lt;code>library(cmprsk)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Warning: package &amp;#39;cmprsk&amp;#39; was built under R version 4.1.3&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div id="cumulative-incidence-in-melanoma-data" class="section level1">
&lt;h1>Cumulative incidence in Melanoma data&lt;/h1>
&lt;div id="description-of-the-melanoma-data" class="section level4">
&lt;h4>Description of the Melanoma data&lt;/h4>
&lt;p>The &lt;strong>Melanoma dataset&lt;/strong> is available in the &lt;strong>MASS&lt;/strong> package. It contains variables:&lt;/p>
&lt;ul>
&lt;li>&lt;em>time&lt;/em> survival times in days, possibly censored&lt;/li>
&lt;li>&lt;em>status&lt;/em> 1 died from melanoma, 2 alive, 3 dead from other causes.&lt;/li>
&lt;li>&lt;em>sex&lt;/em> 1= male; 0= Female&lt;/li>
&lt;li>&lt;em>age&lt;/em> age in years&lt;/li>
&lt;li>&lt;em>year&lt;/em> of operation&lt;/li>
&lt;li>&lt;em>thickness&lt;/em> tumor thickness in mm.&lt;/li>
&lt;li>&lt;em>ulcer&lt;/em> 1= presence; 0= absence&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>head(MASS::Melanoma)&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:right;">
time
&lt;/th>
&lt;th style="text-align:right;">
status
&lt;/th>
&lt;th style="text-align:right;">
sex
&lt;/th>
&lt;th style="text-align:right;">
age
&lt;/th>
&lt;th style="text-align:right;">
year
&lt;/th>
&lt;th style="text-align:right;">
thickness
&lt;/th>
&lt;th style="text-align:right;">
ulcer
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:right;">
10
&lt;/td>
&lt;td style="text-align:right;">
3
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
76
&lt;/td>
&lt;td style="text-align:right;">
1972
&lt;/td>
&lt;td style="text-align:right;">
6.76
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
30
&lt;/td>
&lt;td style="text-align:right;">
3
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
56
&lt;/td>
&lt;td style="text-align:right;">
1968
&lt;/td>
&lt;td style="text-align:right;">
0.65
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
35
&lt;/td>
&lt;td style="text-align:right;">
2
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
41
&lt;/td>
&lt;td style="text-align:right;">
1977
&lt;/td>
&lt;td style="text-align:right;">
1.34
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
99
&lt;/td>
&lt;td style="text-align:right;">
3
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;td style="text-align:right;">
71
&lt;/td>
&lt;td style="text-align:right;">
1968
&lt;/td>
&lt;td style="text-align:right;">
2.90
&lt;/td>
&lt;td style="text-align:right;">
0
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
185
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
52
&lt;/td>
&lt;td style="text-align:right;">
1965
&lt;/td>
&lt;td style="text-align:right;">
12.08
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right;">
204
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;td style="text-align:right;">
28
&lt;/td>
&lt;td style="text-align:right;">
1971
&lt;/td>
&lt;td style="text-align:right;">
4.84
&lt;/td>
&lt;td style="text-align:right;">
1
&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div id="cumulative-incidence-in-melanoma-data-1" class="section level4">
&lt;h4>Cumulative incidence in Melanoma data&lt;/h4>
&lt;p>Estimate the cumulative incidence in the context of competing risks using the &lt;em>cuminc&lt;/em> function.&lt;/p>
&lt;p>&lt;strong>Note:&lt;/strong> in the Melanoma data, censored patients are coded as 2 for &lt;em>status&lt;/em>, so we cannot use the &lt;em>cencode&lt;/em> option for the &lt;em>cuminc()&lt;/em> function default of 0.&lt;/p>
&lt;pre class="r">&lt;code>ci_fit= cuminc(MASS::Melanoma$time,MASS::Melanoma$status,cencode = 2)
ci_fit&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Estimates and Variances:
## $est
## 1000 2000 3000 4000 5000
## 1 1 0.12745714 0.23013963 0.30962017 0.3387175 0.3387175
## 1 3 0.03426709 0.05045644 0.05811143 0.1059471 0.1059471
##
## $var
## 1000 2000 3000 4000 5000
## 1 1 0.0005481186 0.0009001172 0.0013789328 0.001690760 0.001690760
## 1 3 0.0001628354 0.0002451319 0.0002998642 0.001040155 0.001040155&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div id="plot-the-cumulative-incidence-cif" class="section level1">
&lt;h1>Plot the Cumulative incidence (CIF)&lt;/h1>
&lt;p>Here, I’m showing how to plot CIF using base R. There is a another beautiful function is available named &lt;em>ggcompetingrisks()&lt;/em> which is available under the &lt;em>survminer&lt;/em> package.&lt;/p>
&lt;pre class="r">&lt;code>plot(ci_fit,xlab=&amp;quot;Days&amp;quot;)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-36-1.png" width="672" />&lt;/p>
&lt;p>In the legend:&lt;/p>
&lt;ul>
&lt;li>1st number indicates the the group. In this case there is only one group as overall data, so it is ‘1’ for both&lt;/li>
&lt;li>2nd number indicates the event type. In this case the &lt;strong>solid line is 1 for death from melanoma&lt;/strong> and the &lt;strong>dashed line is 3 for death from other cases&lt;/strong>.&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="plot-the-cumulative-incidence-cif-manually" class="section level1">
&lt;h1>Plot the Cumulative incidence (CIF) manually&lt;/h1>
&lt;p>We can plot this above CIF curve manually in base R.&lt;/p>
&lt;pre class="r">&lt;code>##-- For &amp;#39;status= 1&amp;#39;: death from melanoma --##
time_1= ci_fit$`1 1`$time
estimate_1= ci_fit$`1 1`$est
##-- For &amp;#39;status= 3&amp;#39;: death from other cases --##
time_3= ci_fit$`1 3`$time
estimate_3= ci_fit$`1 3`$est
##-- Plotting the Cumulative Incidence Curve --##
plot(x=time_1, y=estimate_1,
type = &amp;quot;S&amp;quot;,
lwd=1,
ylim= c(0,1),
col= adjustcolor(&amp;quot;blue&amp;quot;,alpha.f = 0.55),
xlab = &amp;quot;Days&amp;quot;,
ylab = &amp;quot;Probability of an event&amp;quot;,
main=&amp;quot;Cumulative incidence functions&amp;quot;)
par(new=T)
plot(x=time_3, y=estimate_3,
type = &amp;quot;S&amp;quot;,
lwd=1,
ylim=c(0,1),
col= adjustcolor(&amp;quot;red&amp;quot;,alpha.f = 0.55),
xlab = &amp;quot;&amp;quot;,
ylab = &amp;quot;&amp;quot;,
axes=F)
legend(&amp;quot;top&amp;quot;,
legend=c(&amp;quot;1: death from melanoma&amp;quot;,&amp;quot;3: death from other cases&amp;quot;),
col=c(adjustcolor(&amp;quot;blue&amp;quot;,alpha.f = 0.55),
adjustcolor(&amp;quot;red&amp;quot;,alpha.f = 0.55)),
lty=c(1,1),
title = &amp;quot;Event&amp;quot;,
box.col = &amp;quot;white&amp;quot;,
horiz = T)
box()&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-37-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="compare-cumultive-incidence-between-groups" class="section level1">
&lt;h1>Compare cumultive incidence between groups&lt;/h1>
&lt;p>Note that in &lt;em>cuminc&lt;/em> &lt;strong>Gray’s test&lt;/strong> is used for between-group tests.&lt;/p>
&lt;p>As an example, compare the Melanoma outcomes according to &lt;em>ulcer&lt;/em>, the presence or absence of ulceration. The results of the tests can be found in &lt;em>Tests&lt;/em>.&lt;/p>
&lt;pre class="r">&lt;code>ci_fit_ulcer= cuminc(ftime = MASS::Melanoma$time,
fstatus = MASS::Melanoma$status,
group = MASS::Melanoma$ulcer,
cencode = 2)
ci_fit_ulcer&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Tests:
## stat pv df
## 1 26.120719 3.207240e-07 1
## 3 0.158662 6.903913e-01 1
## Estimates and Variances:
## $est
## 1000 2000 3000 4000 5000
## 0 1 0.03509042 0.10322276 0.18165409 0.18165409 0.1816541
## 1 1 0.24444444 0.38972746 0.46972340 0.53306966 NA
## 0 3 0.01746826 0.02624086 0.04028177 0.12960814 0.1296081
## 1 3 0.05555556 0.07981432 0.07981432 0.07981432 NA
##
## $var
## 1000 2000 3000 4000 5000
## 0 1 0.0002997449 0.0008952562 0.0019180376 0.0019180376 0.001918038
## 1 1 0.0020796399 0.0026929462 0.0035308463 0.0046320135 NA
## 0 3 0.0001512406 0.0002255429 0.0004165726 0.0029626459 0.002962646
## 1 3 0.0005902878 0.0008546097 0.0008546097 0.0008546097 NA&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>ci_fit_ulcer[[&amp;#39;Tests&amp;#39;]]&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## stat pv df
## 1 26.120719 3.207240e-07 1
## 3 0.158662 6.903913e-01 1&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="plot-the-cumulative-incidence-cif-between-groups-manually" class="section level1">
&lt;h1>Plot the Cumulative incidence (CIF) between groups manually&lt;/h1>
&lt;p>We can plot Cumulative incidence between groups using simply plot() function but it is good practice to visualize this using &lt;em>ggcompetingrisks&lt;/em> function. But I always prefer to use base R functions and use additional packages only when it is absolutely necessary.&lt;/p>
&lt;pre class="r">&lt;code>plot(ci_fit_ulcer$`0 1`$time,ci_fit_ulcer$`0 1`$est,
type = &amp;quot;S&amp;quot;,
lwd=1,
lty=1,
ylim = c(0,1),
col=adjustcolor(&amp;quot;blue&amp;quot;,alpha.f = 0.5),
xlab = &amp;quot;Days&amp;quot;,
ylab = &amp;quot;Cumulative incidence of event&amp;quot;,
main=&amp;quot;Death by ulceration&amp;quot;,
bg=gray(0.4,0.3))
par(new=T)
plot(ci_fit_ulcer$`0 3`$time,ci_fit_ulcer$`0 3`$est,
type = &amp;quot;S&amp;quot;,
lwd=1,
lty=1,
ylim = c(0,1),
col=adjustcolor(&amp;quot;red&amp;quot;,alpha.f = 0.5),
xlab = &amp;quot;&amp;quot;,
ylab = &amp;quot;&amp;quot;,
axes = F)
par(new=T)
plot(ci_fit_ulcer$`1 1`$time,ci_fit_ulcer$`1 1`$est,
type = &amp;quot;S&amp;quot;,
lwd=1,
lty=2,
ylim = c(0,1),
col=adjustcolor(&amp;quot;blue&amp;quot;,alpha.f = 0.5),
xlab = &amp;quot;&amp;quot;,
ylab = &amp;quot;&amp;quot;,
axes = F)
par(new=T)
plot(ci_fit_ulcer$`1 3`$time,ci_fit_ulcer$`1 3`$est,
type = &amp;quot;S&amp;quot;,
lwd=1,
lty=2,
ylim = c(0,1),
col=adjustcolor(&amp;quot;red&amp;quot;,alpha.f = 0.5),
xlab = &amp;quot;&amp;quot;,
ylab = &amp;quot;&amp;quot;,
axes = F)
legend(&amp;quot;topleft&amp;quot;,
legend = c(&amp;quot;0: not ulcerated&amp;quot;,&amp;quot; 1: ulcerated&amp;quot;),
col=c(1,1),
lty=c(1,2),
title = &amp;quot;Group&amp;quot;,
box.col = &amp;quot;white&amp;quot;)
legend(&amp;quot;topright&amp;quot;,
legend = c(&amp;quot;1: death from melanoma&amp;quot;,&amp;quot;3: death from other cases&amp;quot;),
fill= c(adjustcolor(&amp;quot;blue&amp;quot;,alpha.f = 0.5),adjustcolor(&amp;quot;red&amp;quot;,alpha.f = 0.5)),
title = &amp;quot;Event&amp;quot;,
box.col = &amp;quot;white&amp;quot;)
box()&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_viii/index_files/figure-html/unnamed-chunk-39-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="competing-risks-regression" class="section level1">
&lt;h1>Competing risks regression&lt;/h1>
&lt;p>As discussed in the earlier, there are two approaches:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Cause-specific hazards&lt;/strong>
&lt;ul>
&lt;li>instantaneous rate of occurrence of the given type of event in subjects who are currently event-free&lt;/li>
&lt;li>estimated using &lt;strong>Cox regression&lt;/strong> (&lt;em>coxph&lt;/em> function)&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;strong>Subdistribution hazards&lt;/strong>
&lt;ul>
&lt;li>instantaneous rate of occurrence of the given type of event in subjects who have not yet experienced an event of that type.&lt;/li>
&lt;li>estimated using &lt;strong>Fine-Gray regression&lt;/strong> (&lt;em>crr&lt;/em> function)&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;div id="a-competing-risks-regression-in-melanoma-data--subdistribution-hazard-approach" class="section level3">
&lt;h3>(A) Competing risks regression in Melanoma data- subdistribution hazard approach&lt;/h3>
&lt;p>Let’s say we are interested in looking at the effect if age and sex on death from melanoma, with death from other causes as a competing event.&lt;/p>
&lt;p>&lt;strong>Notes:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;em>crr&lt;/em> requires specification of covariates as a matrix.&lt;/p>&lt;/li>
&lt;li>&lt;p>If more than one event is of interest, you can request results for a different event by using the failcode option, by default results are returned for &lt;em>failcode = 1&lt;/em>.&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>shr_fit= crr(ftime = MASS::Melanoma$time,
fstatus = MASS::Melanoma$status,
cov1 = MASS::Melanoma[,c(&amp;quot;sex&amp;quot;,&amp;quot;age&amp;quot;)],
cencode = 2)
shr_fit&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## convergence: TRUE
## coefficients:
## sex age
## 0.58840 0.01259
## standard errors:
## [1] 0.271800 0.009301
## two-sided p-values:
## sex age
## 0.03 0.18&lt;/code>&lt;/pre>
&lt;p>In the above example, both ‘sex’ and ‘age’ were coded as numeric variables. The &lt;em>crr&lt;/em> function cannot naturally handle character variables, and you will get an error, so if character variables are present we have to create dummy variables using &lt;em>model.matrix&lt;/em>&lt;/p>
&lt;/div>
&lt;div id="b-competing-risks-regression-in-melanoma-data--cause-specific-hazard-approach" class="section level3">
&lt;h3>(B) Competing risks regression in Melanoma data- Cause-specific hazard approach&lt;/h3>
&lt;p>Censor all subjects who did not have the event of interest, in this case death from melanoma, and use &lt;em>coxph&lt;/em> as before. So patients who died from other causes are now censored for the cause-specific hazard approach to competing risks.&lt;/p>
&lt;pre class="r">&lt;code>chr_fit= coxph(Surv(time,ifelse(status== 1,1,0))~sex + age, data = MASS::Melanoma)
summary(chr_fit)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Call:
## coxph(formula = Surv(time, ifelse(status == 1, 1, 0)) ~ sex +
## age, data = MASS::Melanoma)
##
## n= 205, number of events= 57
##
## coef exp(coef) se(coef) z Pr(&amp;gt;|z|)
## sex 0.598259 1.818949 0.267639 2.235 0.0254 *
## age 0.016542 1.016679 0.008663 1.910 0.0562 .
## ---
## Signif. codes: 0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sex 1.819 0.5498 1.0765 3.074
## age 1.017 0.9836 0.9996 1.034
##
## Concordance= 0.631 (se = 0.037 )
## Likelihood ratio test= 9.94 on 2 df, p=0.007
## Wald test = 10 on 2 df, p=0.007
## Score (logrank) test = 10.26 on 2 df, p=0.006&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div></description></item><item><title>Sample Size Calculation in R.</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_vii/</link><pubDate>Wed, 16 Mar 2022 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_vii/</guid><description>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vii/index_files/header-attrs/header-attrs.js">&lt;/script>
&lt;div id="TOC">
&lt;ul>
&lt;li>&lt;a href="#the-why-of-sample-size-calculations">The Why of Sample Size Calculations :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#key-features-of-sample-size-calculation">Key features of Sample Size Calculation :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#effect-size">Effect Size :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#mathematical-formulas-for-calculating-sample-sazes">Mathematical Formulas for calculating sample Sazes :&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#a-for-estimation">(A) For Estimation :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#b-for-testing">(B) For testing :&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#sample-size-calculation-in-r">Sample Size Calculation in R :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#one-mean-t-test">One Mean T-test :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#two-means-t-test">Two Means T-test :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#paired-t-test">Paired T-test :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#one-way-anova">One-Way ANOVA :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#single-proportion-test">Single Proportion Test :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#two-proportions-test">Two Proportions Test :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#chi-squared-test">Chi-Squared Test :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#simple-multiple-linear-regression">Simple &amp;amp; Multiple Linear Regression :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#correlation">Correlation :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#non-parametric-t-tests">Non-Parametric T-tests :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#kruskal-wallace-test">Kruskal Wallace Test :&lt;/a>&lt;/li>
&lt;li>&lt;a href="#repeated-measures-anova">Repeated Measures ANOVA :&lt;/a>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="the-why-of-sample-size-calculations" class="section level2">
&lt;h2>The Why of Sample Size Calculations :&lt;/h2>
&lt;ul>
&lt;li>&lt;p>In designing an experiment, a key question is : &lt;strong>How many individuals/subjects do I need for my experiment ?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>Too small of a sample size can under detect the effect of interest in our experiment.&lt;/p>&lt;/li>
&lt;li>&lt;p>Too large of a sample size may lead to unnecessary wasting of resources and individuals.&lt;/p>&lt;/li>
&lt;li>&lt;p>We want our sample size to be &lt;em>‘just right’&lt;/em>.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;em>The answer:&lt;/em> &lt;strong>Sample Size Calculation&lt;/strong>.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;em>Goal:&lt;/em> &lt;strong>We strive to have enough samples to resonably detect if it really is there without wasting limited resources on too many samples.&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="key-features-of-sample-size-calculation" class="section level2">
&lt;h2>Key features of Sample Size Calculation :&lt;/h2>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect Size:&lt;/strong> magnitude of the effect under the &lt;span class="math inline">\(H_1\)&lt;/span> (&lt;em>alternative&lt;/em>). - the larger the effect size, the easier it is to an effect and require fewer samples.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Power:&lt;/strong> Probability of correctly rejecting the &lt;span class="math inline">\(H_0\)&lt;/span>(&lt;em>null&lt;/em>) if it is flse. i.e., (&lt;span class="math inline">\(1-\beta\)&lt;/span>), where &lt;span class="math inline">\(\beta\)&lt;/span>= Type-II Error.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Significance level(&lt;span class="math inline">\(\alpha\)&lt;/span>):&lt;/strong> Probability of falsely rejecting the null hypothesis even through it is true. i.e., Type-I error.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="effect-size" class="section level2">
&lt;h2>Effect Size :&lt;/h2>
&lt;ul>
&lt;li>&lt;p>While Power and Significance level are usually set irrespective of the data, the effect size is a property of the sample data.&lt;/p>&lt;/li>
&lt;li>&lt;p>It is essentially a function of the difference between the means of the null and alternative hypotheses over the variation (standard deviation) in the data.&lt;span class="math display">\[Effect\:Size \approx \frac{{|{\mu}_{H_1}-{\mu}_{H_0}}|}{\sigma}\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Note that, this sample size can also be calculated from the Confidence interval. But here we are ignoring that technique.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="mathematical-formulas-for-calculating-sample-sazes" class="section level2">
&lt;h2>Mathematical Formulas for calculating sample Sazes :&lt;/h2>
&lt;div id="a-for-estimation" class="section level3">
&lt;h3>(A) For Estimation :&lt;/h3>
&lt;div class="figure">
&lt;img src="Sample%20Size%20estimation%20formula%20for%20Esimation%20point%20of%20view.PNG" alt="" />
&lt;p class="caption">For Estimation&lt;/p>
&lt;/div>
&lt;/div>
&lt;div id="b-for-testing" class="section level3">
&lt;h3>(B) For testing :&lt;/h3>
&lt;div class="figure">
&lt;img src="Sample%20Size%20estimation%20formula%20for%20testing%20Proportion.PNG" alt="" />
&lt;p class="caption">For Proportion&lt;/p>
&lt;/div>
&lt;div class="figure">
&lt;img src="Sample%20Size%20estimation%20formula%20for%20testing%20Mean.PNG" alt="" />
&lt;p class="caption">For Mean&lt;/p>
&lt;/div>
&lt;div class="figure">
&lt;img src="Sample%20Size%20estimation%20formula%20for%20Epidemiology.PNG" alt="" />
&lt;p class="caption">For Epidemiology Study Design&lt;/p>
&lt;/div>
&lt;div class="figure">
&lt;img src="Sample%20Size%20estimation%20formula%20for%20Epidemiology_2.PNG" alt="" />
&lt;p class="caption">For Epidemiology Study Design&lt;/p>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;div id="sample-size-calculation-in-r" class="section level2">
&lt;h2>Sample Size Calculation in R :&lt;/h2>
&lt;table>
&lt;tbody>
&lt;tr class="odd">
&lt;td>Table of R packages &amp;amp; functions for calculating Sample Size for different tests&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="left">Name of test&lt;/th>
&lt;th align="center">Package&lt;/th>
&lt;th align="center">Function&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="left">One Mean T-test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.t.test()&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Two Means T-test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.t.test()&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Two Means T-test (unequal Sample)&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.t2n.test()&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Paired T-test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.t.test()&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">One-way ANOVA&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.anova.test()&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Single Proportion Test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.p.test()&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Two Proportions Test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.2p.test()&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Two Proportion Test (unequal Sample)&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.2p2n.test()&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Chi-Squared Test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.chisq.test()&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Simple Linear Regression&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.f2.test()&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Multiple Linear Regression&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.f2.test()&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Correlation&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.r.test()&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">One Mean Wilcoxon Test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.t.test()+15%&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Mann-Whitney Test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.t.test()+15%&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Paired Wilcoxon Test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.t.test()+15%&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Kruskal Wallace Test&lt;/td>
&lt;td align="center">pwr&lt;/td>
&lt;td align="center">pwr.anova.test()+15%&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Repeated Measures ANOVA&lt;/td>
&lt;td align="center">WebPower&lt;/td>
&lt;td align="center">wp.rmanova()&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Multi-way ANOVA (1 Category of interest)&lt;/td>
&lt;td align="center">WebPower&lt;/td>
&lt;td align="center">wp.kanova()&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Multi-way ANOVA (&amp;gt;1 Category of interest)&lt;/td>
&lt;td align="center">WebPower&lt;/td>
&lt;td align="center">wp.kanova()&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Non-Parametric Regression (Logistic)&lt;/td>
&lt;td align="center">WebPower&lt;/td>
&lt;td align="center">wp.logistic()&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Non-Parametric Regression (Poisson)&lt;/td>
&lt;td align="center">WebPower&lt;/td>
&lt;td align="center">wp.poisson&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Multilevel modeling: CRT&lt;/td>
&lt;td align="center">WebPower&lt;/td>
&lt;td align="center">wp.crt2arm/wp.crt3arm&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Multilevel modeling: MRT&lt;/td>
&lt;td align="center">WebPower&lt;/td>
&lt;td align="center">wp.mrt2arm/wp.mrt3arm&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div id="one-mean-t-test" class="section level2">
&lt;h2>One Mean T-test :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> This tests if a sample mean is any different from a set value for a normally distributed variable.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>1&lt;/td>
&lt;td>0&lt;/td>
&lt;td>0&lt;/td>
&lt;td>0&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math inline">\(Effect\:Size(D)= \frac{{|{\mu}_{H_1}-{\mu}_{H_0}}|}{\sigma}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Is the average body temperature of college students any different from 98.6°F?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\:: Avg\:Body\:temp.=98.6°F\)&lt;/span> and &lt;span class="math inline">\(H_0\:: Avg\:Body\:temp.\neq 98.6°F\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>We will guess that the &lt;strong>effect sizes will be medium.&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For t-tests: &lt;strong>0.2=small&lt;/strong>, &lt;strong>0.5=medium&lt;/strong>, and &lt;strong>0.8=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “&lt;strong>one.sample&lt;/strong>”, “paired”))&lt;/em>&lt;/p>
&lt;ul>
&lt;li>d= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;li>type= type of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code> library(pwr)
Pwer_t=pwr.t.test(d=0.5, sig.level=0.05, power=0.80, type=&amp;quot;one.sample&amp;quot;,alternative=&amp;quot;two.sided&amp;quot;)
Pwer_t&lt;/code>&lt;/pre>
&lt;pre>&lt;code>##
## One-sample t test power calculation
##
## n = 33.36713
## d = 0.5
## sig.level = 0.05
## power = 0.8
## alternative = two.sided&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;Sample Size by rounding off is:&amp;quot;,round(Pwer_t$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;Sample Size by rounding off is:33&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with &lt;span class="math inline">\(\alpha=0.05\)&lt;/span>, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if the average income of college freshman is less than Rs.20,000. You collect trial data and find that the mean income was Rs.14,500 (SD=6000).&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if the average sleep time change in a year for college freshman is different from zero. You collect the following data of sleep change (in hours).&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Variable&lt;/th>
&lt;th align="center">Values&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">Sleep Change&lt;/td>
&lt;td align="center">-0.55, 0.16, 2.6, 0.65, -0.23, 0.21, -4.3, 2, -1.7, 1.9&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(iii)&lt;/strong> You are interested in determining if the average weight change in a year for college freshman is greater than zero.&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>You are interested in determining if the average income of college freshman is less than Rs.20,000. You collect trial data and find that the mean income was Rs.14,500 (SD=6000).&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Effect size = &lt;span class="math inline">\((Mean_{H_1}-Mean_{H_0})/SD= (14,500-20,000)/6000 = -0.917\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>One-tailed test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.t.test(d=-0.917, sig.level=0.05, power=0.80, type=&amp;quot;one.sample&amp;quot;, alternative=&amp;quot;less&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :9&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>Effect size =&lt;span class="math inline">\((Mean_{H_1}-Mean_{H_0})/SD =(-0.446-0)/1.96 = -0.228\)&lt;/span>&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Two-tailed test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.t.test(d=-0.228, sig.level=0.05, power=0.80, type=&amp;quot;one.sample&amp;quot;, alternative=&amp;quot;two.sided&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 153&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="3" style="list-style-type: lower-roman">
&lt;li>&lt;em>Try it by yourself.&lt;/em>&lt;/li>
&lt;/ol>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="two-means-t-test" class="section level2">
&lt;h2>Two Means T-test :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> this tests if a mean from one group is different from the mean of another group for a normally distributed variable. AKA, testing to see if the difference in means is different from zero.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>1&lt;/td>
&lt;td>1&lt;/td>
&lt;td>2&lt;/td>
&lt;td>1&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math inline">\(Effect\:Size(D)= \frac{{|Mean_{H_1}-Mean_{H_0}}|}{SD_{pooled}}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>: Is the average body temperature higher in women than in men?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\:: Avg\:difference\:Body\:temp.\:between\:men\:and\: women=0°F\)&lt;/span> and &lt;span class="math inline">\(H_1\:: Avg\:difference\:Body\:temp.\:between\:men\:and\: women&amp;gt;0°F\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>We will guess that the &lt;strong>effect sizes will be medium.&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For t-tests: &lt;strong>0.2=small&lt;/strong>, &lt;strong>0.5=medium&lt;/strong>, and &lt;strong>0.8=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>Selected greater, because we only cared to test if women’s temp was higher, not lower (group 1 is women, group 2 is men)&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.t.test(d = , sig.level = , power = , type = c(“&lt;strong>two.sample&lt;/strong>”, “one.sample”, “paired”))&lt;/em>&lt;/p>
&lt;ul>
&lt;li>d= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;li>type= type of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.t.test(d=0.5, sig.level=0.05, power=0.80,type=&amp;quot;two.sample&amp;quot;, alternative=&amp;quot;greater&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :50&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if the average daily caloric intake different between men and women. You collected trial data and found the average caloric intake for males to be 2350.2 (SD=258), while females had intake of 1872.4 (SD=420).&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if the average protein level in blood different between men and women. You collected the following trial data on protein level (grams/deciliter).&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Protein&lt;/th>
&lt;th align="center">levels&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">Male Protein&lt;/td>
&lt;td align="center">1.8, 5.8, 7.1, 4.6, 5.5, 2.4, 8.3, 1.2&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">Female Protein&lt;/td>
&lt;td align="center">9.5, 2.6, 3.7, 4.7, 6.4, 8.4, 3.1, 1.4&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(iii)&lt;/strong> You are interested in determining if the average glucose level in blood is lower in men than women.&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>You are interested in determining if the average income of college freshman is less than Rs.20,000. You collect trial data and find that the mean income was Rs.14,500 (SD=6000).&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Effect size = &lt;span class="math inline">\((Mean_{H1}-Mean_{H0})/ SD_{pooled} =(2350.2-1872.4)/ \sqrt{(2582+ 4202)/2} =477.8/348.54 = 1.37\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>two-tailed test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.t.test(d=1.37, sig.level=0.05, power=0.80, type=&amp;quot;two.sample&amp;quot;,alternative=&amp;quot;two.sided&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :9&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>Effect size =&lt;span class="math inline">\((Mean_{H_1}-Mean_{H_0})/ SD_{pooled} =(4.59-4.98)/ \sqrt{(2.58^2+ 2.88^2)/2} = -0.14\)&lt;/span>&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Two-tailed test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.t.test(d=-0.14, sig.level=0.05, power=0.80, type=&amp;quot;two.sample&amp;quot;, alternative=&amp;quot;two.sided&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 802&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="3" style="list-style-type: lower-roman">
&lt;li>&lt;em>Try it by yourself.&lt;/em>&lt;/li>
&lt;/ol>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="paired-t-test" class="section level2">
&lt;h2>Paired T-test :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> : This tests if a mean from one group is different from the mean of another group, where the groups are dependent (not independent) for a normally distributed variable. Pairing can be leaves on same branch, siblings, the same individual before and after a trial, etc.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>1&lt;/td>
&lt;td>1&lt;/td>
&lt;td>2&lt;/td>
&lt;td>1&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math inline">\(Effect\:Size(D)= \frac{{|Mean_{H_1}-Mean_{H_0}}|}{SD_{pooled}}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Is heart rate higher in patients after a run compared to before a run?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::bpm(after) – bpm(before) \leq 0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: bpm(after) – bpm(before)&amp;gt;0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>We will guess that the &lt;strong>effect sizes will be large.&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For t-tests: &lt;strong>0.2=small&lt;/strong>, &lt;strong>0.5=medium&lt;/strong>, and &lt;strong>0.8=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>Selected One-tailed, because we only cared if bpm was higher after a run.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, “&lt;strong>paired&lt;/strong>”))&lt;/em>&lt;/p>
&lt;ul>
&lt;li>d= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;li>type= type of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.t.test(d=0.8, sig.level=0.05, power=0.80, type=&amp;quot;paired&amp;quot;, alternative=&amp;quot;greater&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :11&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if metabolic rate in patients after surgery is different from before surgery. You collected trial data and found a mean difference of 0.73 (SD=2.9).&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if heart rate is higher in patients after a doctor’s visit compared to before a visit. You collected the following trial data and found mean heart rate before and after a visit.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Heart rate&lt;/th>
&lt;th align="center">levels&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">BPM before&lt;/td>
&lt;td align="center">126, 88, 53.1, 98.5, 88.3, 82.5, 105, 41.9&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">BPM after&lt;/td>
&lt;td align="center">138.6, 110.1, 58.44, 110.2, 89.61, 98.6, 115.3, 64.3&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>You are interested in determining if metabolic rate in patients after surgery is different from before surgery. You collected trial data and found a mean difference of 0.73 (SD=2.9).&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Effect size = &lt;span class="math inline">\((Mean_{H_1}-Mean_{H_0})/SD =(0.73)/ 2.9 = 0.25\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Two-tailed test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.t.test(d=0.25, sig.level=0.05, power=0.80, type=&amp;quot;paired&amp;quot;, alternative=&amp;quot;two.sided&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :128&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>Effect size = &lt;span class="math inline">\((Mean_{H_1}-Mean_{H_0})/ SD_{pooled} =(98.1-85.4)/ \sqrt{(26.82+ 27.22)/2} =12.7/27 = 0.47\)&lt;/span>&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>One-tailed test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.t.test(d=0.47, sig.level=0.05, power=0.80, type=&amp;quot;paired&amp;quot;, alternative=&amp;quot;greater&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 29&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="one-way-anova" class="section level2">
&lt;h2>One-Way ANOVA :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> : This tests if at least one mean is different among groups, where the groups are larger than two, for a normally distributed variable. ANOVA is the extension of the Two Means T-test for more than two groups.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>1&lt;/td>
&lt;td>1&lt;/td>
&lt;td>&amp;gt; 2&lt;/td>
&lt;td>1&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math display">\[Effect\:Size(f)=\sqrt{\frac{\eta^2}{1-\eta^2}}\]&lt;/span> Where, &lt;span class="math display">\[\eta = \frac{SS_T}{TSS}=\frac{Treatment\:Sum\:Squares}{Total\:Sum\:Squares}\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Is there a difference in new car interest rates across 6 different cities?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: \neq 0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>There are a total of 6 groups (cities).&lt;/p>&lt;/li>
&lt;li>&lt;p>We will guess that the &lt;strong>effect sizes will be small.&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For t-tests: &lt;strong>0.2=small&lt;/strong>, &lt;strong>0.5=medium&lt;/strong>, and &lt;strong>0.8=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>Groups assumed to be the same size.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.anova.test(k =, f = , sig.level = , power = )&lt;/em>&lt;/p>
&lt;ul>
&lt;li>k= number of groups&lt;/li>
&lt;li>f= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.anova.test(k =6 , f =0.1 , sig.level=0.05 , power =0.80 )$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :215&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining there is a difference in weight lost between 4 different surgery options. You collect the following trial data of weight lost in pounds.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Surgery&lt;/th>
&lt;th align="center">Weight Measures&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">A&lt;/td>
&lt;td align="center">6.3, 2.8, 7.8, 7.9, 4.9&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">B&lt;/td>
&lt;td align="center">9.9, 4.1, 3.9, 6.3, 6.9&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="center">C&lt;/td>
&lt;td align="center">5.1, 2.9, 3.6, 5.7, 4.5&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">D&lt;/td>
&lt;td align="center">1.0, 2.8, 4.8, 3.9, 1.6&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if there is a difference in white blood cell counts between 5 different medication regimes.&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>Here,&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>&lt;span class="math inline">\(\eta = SS_T/TSS=31.47/(31.47+62.87) = 0.33\)&lt;/span>
Note that, you can calculate &lt;span class="math inline">\(SS_T\)&lt;/span> &amp;amp; &lt;span class="math inline">\(TSS\)&lt;/span> by performing ANOVA on the dataset using &lt;em>aov()&lt;/em> function.&lt;/p>&lt;/li>
&lt;li>&lt;p>Effect size&lt;span class="math inline">\((f)\)&lt;/span> = &lt;span class="math inline">\(\sqrt{\eta^2/(1-\eta^2)}=\sqrt{0.33/(1- 0.33)} = 0.7\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>No. of groups= 4&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.anova.test(k =4, f =0.7, sig.level=0.05, power =0.80 )$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :7&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>You are interested in determining if there is a difference in white blood cell counts between 5 different medication regimes.&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Guessed a medium effect size (0.25)&lt;/p>&lt;/li>
&lt;li>&lt;p>No. of groups= 5&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.anova.test(k =5, f =0.25, sig.level=0.05, power =0.80 )$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 39&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="single-proportion-test" class="section level2">
&lt;h2>Single Proportion Test :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> : This tests when you only have a single proportion and you want to know if the proportions of certain values differ from some constant proportion.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>0&lt;/td>
&lt;td>1&lt;/td>
&lt;td>2&lt;/td>
&lt;td>1&lt;/td>
&lt;td>N/A&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math display">\[Effect\:Size(h)=2*(arcsin(\sqrt{p_{H_1}}))-2*(arcsin(\sqrt{p_{H_0}})))\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Is there a significance difference in cancer prevalence of middle-aged women who have a sister with breast cancer (5%) compared to the general population prevalence (2%)?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: \neq 0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a small effect size.&lt;/p>&lt;/li>
&lt;li>&lt;p>For h-tests: &lt;strong>0.2=small&lt;/strong>, &lt;strong>0.5=medium&lt;/strong>, and &lt;strong>0.8=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>Selected Two-sided, because we don’t care about directionality.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.p.test(h = , sig.level =, power =, alternative=“two.sided”, “less”, or “greater” )&lt;/em>&lt;/p>
&lt;ul>
&lt;li>h= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;li>alternative= type of tail&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round( pwr.p.test(h=0.2, sig.level=0.05, power=0.80, alternative=&amp;quot;two.sided&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :196&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if the male incidence rate proportion of cancer in North Dakota is higher than the US average (prop=0.00490). You find trial data cancer prevalence of 0.00495.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if the female incidence rate proportion of cancer in North Dakota is lower than the US average (prop=0.00420).&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>Here,&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Effect size = &lt;span class="math inline">\(2*arcsin(\sqrt{0.00495})-2*arcsin(\sqrt{0.00490})=0.0007\)&lt;/span>. Note that, in R arcsin can be calculated by the function &lt;em>asin()&lt;/em>. Difference of proportion power calculation for binomial distribution (arcsine transformation)&lt;/p>&lt;/li>
&lt;li>&lt;p>One-sided test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.p.test(h=0.0007, sig.level=0.05, power=0.80, alternative=&amp;quot;greater&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :12617464&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>You are interested in determining if the female incidence rate proportion of cancer in North Dakota is lower than the US average (prop=0.00420).&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Guess a very low effect size (0.001)&lt;/p>
&lt;ul>
&lt;li>&lt;p>One-tailed test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.p.test(h=-0.001, sig.level=0.05, power=0.80, alternative=&amp;quot;less&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 6182557&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="two-proportions-test" class="section level2">
&lt;h2>Two Proportions Test :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> : this tests when you only have two groups and you want to know if the proportions of each group are different from one another.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>0&lt;/td>
&lt;td>2&lt;/td>
&lt;td>2&lt;/td>
&lt;td>2&lt;/td>
&lt;td>N/A&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math display">\[Effect\:Size(h)=2*(arcsin(\sqrt{p_{H_1}}))-2*(arcsin(\sqrt{p_{H_0}})))\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Is the expected proportion of students passing a stats course taught by psychology teachers different from the observed proportion of students passing the same stats class taught by mathematics teachers?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: \neq 0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a small effect size.&lt;/p>&lt;/li>
&lt;li>&lt;p>For h-tests: &lt;strong>0.2=small&lt;/strong>, &lt;strong>0.5=medium&lt;/strong>, and &lt;strong>0.8=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>Selected Two-sided, because we don’t care about directionality.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.2p.test(h = , sig.level =, power =, alternative=“two.sided”, “less”, or “greater” )&lt;/em>&lt;/p>
&lt;ul>
&lt;li>h= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;li>alternative= type of tail&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round( pwr.2p.test(h=0.2, sig.level=0.05, power=.80, alternative=&amp;quot;two.sided&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :392&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if the expected proportion (P1) of students passing a stats course taught by psychology teachers is different than the observed proportion (P2) of students passing the same stats class taught by biology teachers. You collected the following data of passed tests.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Teaching Method&lt;/th>
&lt;th align="center">Response&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>Psychology&lt;/td>
&lt;td align="center">Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, No&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Biology&lt;/td>
&lt;td align="center">No, No, Yes, Yes, Yes, No, Yes, No, Yes, Yes&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining of the expected proportion (P1) of female students who selected YES on a question was higher than the observed proportion (P2) of male students who selected YES. The observed proportion of males who selected yes was 0.75.&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>Here,&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>&lt;span class="math inline">\(p_1=7/10=0.70, p_2=6/10=0.60\)&lt;/span>
Note that, you can calculate &lt;span class="math inline">\(SS_T\)&lt;/span> &amp;amp; &lt;span class="math inline">\(TSS\)&lt;/span> by performing ANOVA on the dataset using &lt;em>aov()&lt;/em> function.&lt;/p>&lt;/li>
&lt;li>&lt;p>Effect size= &lt;span class="math inline">\(h= 2*asin(\sqrt{0.60})-2*asin(\sqrt{0.70})=-0.21\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.2p.test(h=-0.21, sig.level=0.05, power=0.80, alternative=&amp;quot;two.sided&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :356&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>You are interested in determining if there is a difference in white blood cell counts between 5 different medication regimes.&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Guess that the expected proportion &lt;span class="math inline">\((p_1)\)&lt;/span> =0.85&lt;/p>&lt;/li>
&lt;li>&lt;p>Effect Size= &lt;span class="math inline">\(h= 2*asin(\sqrt{0.85})-2*asin(\sqrt{0.75})=0.25\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.2p.test(h=0.25, sig.level=0.05, power=0.80, alternative=&amp;quot;greater&amp;quot;)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 198&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="chi-squared-test" class="section level2">
&lt;h2>Chi-Squared Test :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> : this tests when you only have two groups and you want to know if the proportions of each group are different from one another.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>&lt;span class="math inline">\(0\)&lt;/span>&lt;/td>
&lt;td>&lt;span class="math inline">\(\geq 1\)&lt;/span>&lt;/td>
&lt;td>&lt;span class="math inline">\(\geq 2\)&lt;/span>&lt;/td>
&lt;td>1&lt;/td>
&lt;td>N/A&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math display">\[Effect\:Size(w)=\sqrt{\frac{{\chi}^2}{n\times df}}\]&lt;/span> where, &lt;span class="math display">\[{\chi}^2=\sum{\frac{(O_i-E_i)^2}{E_i}}\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Does the observed proportions of phenotypes from a genetics experiment different from the expected 9:3:3:1? &lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: \neq 0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a small effect size.&lt;/p>&lt;/li>
&lt;li>&lt;p>For w-tests: &lt;strong>0.1=small&lt;/strong>, &lt;strong>0.3=medium&lt;/strong>, and &lt;strong>0.5=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>Degrees of freedoms= (the number of proportions minus 1) = 4 (phenotypes) – 1 = 3&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.chisq.test(w =, df = , sig.level =, power = )&lt;/em>&lt;/p>
&lt;ul>
&lt;li>w= effect size&lt;/li>
&lt;li>df= degrees of freedom&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.chisq.test(w=0.3, df=3, sig.level=0.05, power=0.80)$N,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :121&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if the ethnic ratios in a company differ by gender. You collect the following trial data from 200 employees.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="left">Gender&lt;/th>
&lt;th align="right">White&lt;/th>
&lt;th align="right">Black&lt;/th>
&lt;th align="center">Am.Indian&lt;/th>
&lt;th align="center">Asian&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="left">Male&lt;/td>
&lt;td align="right">0.60&lt;/td>
&lt;td align="right">0.25&lt;/td>
&lt;td align="center">0.01&lt;/td>
&lt;td align="center">0.14&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Female&lt;/td>
&lt;td align="right">0.65&lt;/td>
&lt;td align="right">0.21&lt;/td>
&lt;td align="center">0.11&lt;/td>
&lt;td align="center">0.03&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if the proportions of student by year (Freshman, Sophomore, Junior, Senior) is any different from 1:1:1:1. You collect the following trial data.&lt;/p>
&lt;hr />
&lt;p>Student 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
Grade Frs, Frs, Frs, Frs, Frs, Frs, Frs, Soph, Soph, Soph, Soph, Soph, Jun, Jun, Jun, Jun, Jun, Sen, Sen, Sen&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>Note that,&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>If they were equal the expected ratios should be the same as the overall ethnic ratios (62.5, 23.0, 6.0, 8.5)&lt;/p>&lt;/li>
&lt;li>&lt;p>Will just focus on males&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;span class="math inline">\(\chi^2= \sum{\frac{(O_i-E_i)^2}{E_i}} = (60-62.5)2/62.5 + (25-23)2/23 + (1-6)2/6 + (14-8.5)2/8.5 = 8\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Effect size= &lt;span class="math inline">\(w = \sqrt{\chi^2 /(n*df)}= \sqrt{8/(200*3)}=0.115\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.chisq.test(w=0.115, df=3, sig.level=0.05, power=0.80)$N,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :824&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>Note that here,&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>&lt;span class="math inline">\(\chi^2= \sum{\frac{(O_i-E_i)^2}{E_i}} = (7-5)^2/5 + (5-5)^2/5 + (5-5)^2/5 + (3-5)^2/5 = 1.6\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Effect Size= &lt;span class="math inline">\(w = \sqrt{\chi^2 /(n*df)}= \sqrt{1.6/(20*3)}=0.163\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.chisq.test(w=0.163, df=3, sig.level=0.05, power=0.80)$N,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 410&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="simple-multiple-linear-regression" class="section level2">
&lt;h2>Simple &amp;amp; Multiple Linear Regression :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> : this test determines if there is a significant relationship between two or more normally distributed numerical variables. The predictor variable is used to try to predict the response variable.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>2 or &amp;gt;2&lt;/td>
&lt;td>0&lt;/td>
&lt;td>NA&lt;/td>
&lt;td>NA&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math display">\[Effect\:Size(f2)=\sqrt{R^2}\]&lt;/span> Where, &lt;span class="math display">\[R^2= Goodness\:of \:fit\:measure(i.e., Adjusted\:R^2)\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Is there a relationship between height and weight in college males?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: \neq 0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a small effect size.&lt;/p>&lt;/li>
&lt;li>&lt;p>For &lt;span class="math inline">\(f2\)&lt;/span>-tests: &lt;strong>0.2=small&lt;/strong>, &lt;strong>0.5=medium&lt;/strong>, and &lt;strong>0.8=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>For simple regression (only one predictor variable) = numerator df=1 &amp;amp; for multiple regression it is just the number of predictor variables.&lt;/p>&lt;/li>
&lt;li>&lt;p>Output will be denominator degrees of freedom rather than sample size; will need to round up and add 2 for simple linear regression &amp;amp; add p+1; (where p= No. of predictor+1, because there is only one dependent outcome variable) for multiple linear regression to get sample size.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.f2.test(u =, v= , f2=, sig.level =, power = )&lt;/em>&lt;/p>
&lt;ul>
&lt;li>u= numerator degrees of freedom&lt;/li>
&lt;li>v= denominator degrees of freedom&lt;/li>
&lt;li>f2= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>To calculate sample size: Sample Size(n)= &lt;strong>(denominator degrees of freedom(v) + Total No. of variables)&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round( pwr.f2.test(u=1, f2=0.35, sig.level=0.05, power=0.80)$v,0)+2)) ##--2 has add because it is a simple linear regression&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :25&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Example: (2)&lt;/strong> &lt;strong>You are interested in determining if height (meters), weight (grams), and fertilizer added (grams) in plants can predict yield (grams of berries). You collect the following trial data. Here &lt;span class="math inline">\(\alpha=0.05\)&lt;/span>, &amp;amp; &lt;span class="math inline">\(Power=(1-\beta)=80%\)&lt;/span>&lt;/strong>&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="left">Variables&lt;/th>
&lt;th align="center">Values&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="left">Yield&lt;/td>
&lt;td align="center">46.8, 48.7, 48.4, 53.7, 56.7&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Height&lt;/td>
&lt;td align="center">14.6, 19.6, 18.6, 25.5, 20.4&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Weight&lt;/td>
&lt;td align="center">95.3, 99.5, 94.1, 110, 103&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">Fertilizer&lt;/td>
&lt;td align="center">2.1, 3.2, 4.3, 1.1, 4.3&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, at first we have to find the &lt;span class="math inline">\(Adjusted\:R^2\)&lt;/span> value by fitting the linear model.&lt;/p>&lt;/li>
&lt;li>&lt;p>Then, we will find the sample size.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code :&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>#--Data--#
yield= c(46.8, 48.7, 48.4, 53.7, 56.7)
height= c(14.6, 19.6, 18.6, 25.5, 20.4)
weight= c(95.3, 99.5, 94.1, 110, 103)
Fert= c(2.1, 3.2, 4.3, 1.1, 4.3)
#-- Fitting Linear Model --#
Model= lm(height~yield + weight + Fert)
#-- Extracting Adjusted R^2 Value --#
R_Sqared= summary(Model)$adj.r.squared
#-- Calculating Effect (f2) --#
f.2= sqrt(R_Sqared)
#-- Calculating sample size --#
##--4 has added because it is a multiple linear Regression with 3 predictors and one dependent variable--##
print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round( pwr.f2.test(u=1, f2=f.2, sig.level=0.05, power=0.80)$v,0)+4))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :14&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="correlation" class="section level2">
&lt;h2>Correlation :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> : This test determines if there is a difference between two numerical values. It is like simple regression, but is not identical.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>2&lt;/td>
&lt;td>0&lt;/td>
&lt;td>NA&lt;/td>
&lt;td>NA&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> Effect Size= r= Correlation Coefficient&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Is there a correlation between hours studied and test score? &lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::r=0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: r\neq 0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a small effect size.&lt;/p>&lt;/li>
&lt;li>&lt;p>For Correlation levels (r): &lt;strong>0.1=small&lt;/strong>, &lt;strong>0.3=medium&lt;/strong>, and &lt;strong>0.5=large&lt;/strong> correlations.&lt;/p>&lt;/li>
&lt;li>&lt;p>Here approximate correlation power calculation is done by arctangh transformation&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.r.test(r = , sig.level = , power = )&lt;/em>&lt;/p>
&lt;ul>
&lt;li>r= correlation&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.r.test(r=0.5, sig.level=0.05, power=0.80)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :28&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if there is a correlation between height and weight in men.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Males&lt;/th>
&lt;th align="center">Measures&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">Height&lt;/td>
&lt;td align="center">178, 166, 172, 186, 182&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">Weight&lt;/td>
&lt;td align="center">165, 139, 257, 225, 196&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if, in lab mice, the correlation between longevity (in months) and average protein intake (grams).&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>Here,&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>first, calculate the correlation value, and then calculate the sample size.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>#-- Data --#
MH= c(178,166,172,186,182)
MW= c(165,139,257,225,196)
#-- correlation value --#
r= cor(MH,MW)
print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.r.test(r=0.37, sig.level=0.05, power=0.80)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :54&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;p>(ii)You are interested in determining if, in lab mice, the correlation between longevity (in months) and average protein intake (grams).&lt;/p>
&lt;ul>
&lt;li>&lt;p>Guessed large (0.5) correlation&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(pwr.r.test(r=0.5, sig.level=0.05, power=0.80)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 28&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="non-parametric-t-tests" class="section level2">
&lt;h2>Non-Parametric T-tests :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> versions of the t-tests for non-parametric data.
&lt;ul>
&lt;li>&lt;em>&lt;span class="math inline">\(\color{red}{\text{One Mean Wilcoxon:}}\)&lt;/span>&lt;/em> sample mean against set value&lt;/li>
&lt;li>&lt;em>&lt;span class="math inline">\(\color{red}{\text{Mann-Whitney:}}\)&lt;/span>&lt;/em> two sample means (unpaired)&lt;/li>
&lt;li>&lt;em>&lt;span class="math inline">\(\color{red}{\text{Paired Wilcoxon:}}\)&lt;/span>&lt;/em> two sample means (paired)&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="left">Name&lt;/th>
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="left">&lt;span class="math inline">\(\color{red}{\text{One Mean Wilcoxon:}}\)&lt;/span>&lt;/td>
&lt;td>1&lt;/td>
&lt;td>0&lt;/td>
&lt;td>0&lt;/td>
&lt;td>0&lt;/td>
&lt;td>No&lt;/td>
&lt;td>NA&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">&lt;span class="math inline">\(\color{red}{\text{Mann-Whitney:}}\)&lt;/span>&lt;/td>
&lt;td>1&lt;/td>
&lt;td>1&lt;/td>
&lt;td>2&lt;/td>
&lt;td>1&lt;/td>
&lt;td>No&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">&lt;span class="math inline">\(\color{red}{\text{Paired Wilcoxon:}}\)&lt;/span>&lt;/td>
&lt;td>1&lt;/td>
&lt;td>1&lt;/td>
&lt;td>2&lt;/td>
&lt;td>1&lt;/td>
&lt;td>No&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math inline">\(\text{Effect Size}\;(\text{Cohen’s D:})= \frac{{|{\mu}_{H_1}-{\mu}_{H_0}}|}{\sigma};\frac{{|{\mu}_{H_1}-{\mu}_{H_0}}|}{\sigma_{pooled}};\frac{{\mu}_{\text{diff}}}{\sigma_{\text{diff}}}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>(for t-tests, 0.2=small, 0.5=medium, and 0.8 large effect sizes)&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: decimal">
&lt;li>&lt;em>&lt;span class="math inline">\(\color{red}{\text{One Mean Wilcoxon:}}\)&lt;/span>&lt;/em> &lt;strong>Is the average number of children in Grand Forks families different than 1?&lt;/strong>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;strong>Solution:&lt;/strong>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\:: 1\;\text{child}\)&lt;/span> and &lt;span class="math inline">\(H_1\:: &amp;gt;1\;\text{child}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a &lt;strong>medium effect size.&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>Select one-tailed (greater)&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “&lt;strong>one.sample&lt;/strong>”, “paired”)) + 15%&lt;/em>&lt;/p>
&lt;ul>
&lt;li>d= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;li>type= type of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>Pwer_t=pwr.t.test(d=0.5, sig.level=0.05, power=0.80, type=&amp;quot;one.sample&amp;quot;, alternative=&amp;quot;greater&amp;quot;)
##-- Nonparametric Correction : adding 15% --##
print(paste0(&amp;quot;Sample Size : &amp;quot;,round((Pwer_t$n*1.15),0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;Sample Size : 30&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: decimal">
&lt;li>&lt;em>&lt;span class="math inline">\(\color{red}{\text{Mann-Whitney:}}\)&lt;/span>&lt;/em> &lt;strong>Does the average number of snacks per day for individuals on a diet differ between young and old persons?&lt;/strong>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;strong>Solution:&lt;/strong>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\:: 0\;\text{difference in snack number, }\)&lt;/span> and &lt;span class="math inline">\(H_1\:: \neq 0\;\text{difference in snack number}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a &lt;strong>small effect size&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>Select two-sided&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.t.test(d = , sig.level = , power = , type = c(“&lt;strong>two.sample&lt;/strong>”, “one.sample”, “paired”)) + 15%&lt;/em>&lt;/p>
&lt;ul>
&lt;li>d= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;li>type= type of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>Note: &lt;a href="https://www.graphpad.com/guides/prism/7/statistics/stat_sample_size_for_nonparametric_.htm">&lt;strong>“Parametric t-test + 15% Approach”&lt;/strong> for calculating Sample Size for Non Parametric test&lt;/a>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>Pwer_t=pwr.t.test(d=0.2, sig.level=0.05, power=0.80, type=&amp;quot;two.sample&amp;quot;, alternative=&amp;quot;two.sided&amp;quot;)
##-- Nonparametric Correction : adding 15% --##
print(paste0(&amp;quot;Sample Size : &amp;quot;,round((Pwer_t$n*1.15),0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;Sample Size : 452&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="3" style="list-style-type: decimal">
&lt;li>&lt;em>&lt;span class="math inline">\(\color{red}{\text{Paired Wilcoxon:}}\)&lt;/span>&lt;/em> &lt;strong>Is genome methylation patterns different between identical twins?&lt;/strong>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;strong>Solution:&lt;/strong>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::\text{0% methylation}\)&lt;/span> and &lt;span class="math inline">\(H_1\:: \neq \text{0% methylation}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a &lt;strong>large effect size&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>Select one-tailed (greater)&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, &lt;strong>“paired”&lt;/strong>)) + 15%&lt;/em>&lt;/p>
&lt;ul>
&lt;li>d= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;li>type= type of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>Pwer_t= pwr.t.test(d=0.8, sig.level=0.05, power=0.80, type=&amp;quot;paired&amp;quot;, alternative=&amp;quot;greater&amp;quot;)
##-- Nonparametric Correction : adding 15% --##
print(paste0(&amp;quot;Sample Size : &amp;quot;,round((Pwer_t$n*1.15),0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;Sample Size : 13&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with &lt;span class="math inline">\(\alpha=0.05\)&lt;/span>, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if the average number of pets in Grand Forks families is greater than 1. You collect the following trial data for pet number.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Variable&lt;/th>
&lt;th align="center">Values&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">Pets&lt;/td>
&lt;td align="center">1, 1, 1, 3, 2, 1, 0, 0, 0, 4&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if the number of meals per day for individuals on a diet is higher in younger people than older. You collected trial data on meals per day.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Variable&lt;/th>
&lt;th align="center">Values&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">Young meals&lt;/td>
&lt;td align="center">1, 2, 2, 3, 3, 3, 3, 4&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">Older meals&lt;/td>
&lt;td align="center">1, 1, 1, 2, 2, 2, 3, 3&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(iii)&lt;/strong> You are interested in determining if genome methylation patterns are higher in the first fraternal twin born compared to the second. You collected the following trial data on methylation level difference (in percentage).&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Variable&lt;/th>
&lt;th align="center">Values&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">Methy.Diff(%)&lt;/td>
&lt;td align="center">5.96, 5.63, 1.25, 1.17, 3.59, 1.64, 1.6, 1.4&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>You are interested in determining if the average income of college freshman is less than Rs.20,000. You collect trial data and find that the mean income was Rs.14,500 (SD=6000).&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Effect size = &lt;span class="math inline">\((Mean_{H_1}-Mean_{H_0})/SD= (1.3-1.0)/1.34 =0.224\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>One-tailed test&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>Pwer_t= pwr.t.test(d=0.224, sig.level=0.05, power=0.80, type=&amp;quot;one.sample&amp;quot;, alternative=&amp;quot;greater&amp;quot;)
#-- Non-parametric Correction --#
print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(Pwer_t$n*1.15,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :143&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>&lt;em>Try it by yourself.&lt;/em>&lt;/li>
&lt;/ol>&lt;/li>
&lt;li>&lt;ol start="3" style="list-style-type: lower-roman">
&lt;li>&lt;em>Try it by yourself.&lt;/em>&lt;/li>
&lt;/ol>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="kruskal-wallace-test" class="section level2">
&lt;h2>Kruskal Wallace Test :&lt;/h2>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Description:&lt;/strong> : this tests if at least one mean is different among groups, where the groups are larger than two for a non-normally distributed variable. (AKA, non-parametric ANOVA). There really isn’t a good way of calculating sample size in R, but you can use a rule of thumb:&lt;/p>
&lt;ul>
&lt;li>Run Parametric Test&lt;/li>
&lt;li>Add 15% to total sample size&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>1&lt;/td>
&lt;td>1&lt;/td>
&lt;td>&amp;gt;2&lt;/td>
&lt;td>1&lt;/td>
&lt;td>No&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> Effect Size = Same as the effect size for the ANOVA.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> ** Is there a difference in draft rank across 3 different months? **&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::r=0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: r\neq 0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>There will be a total of 3 groups (months)&lt;/p>&lt;/li>
&lt;li>&lt;p>You don’t have background info, so you guess that there is a &lt;strong>medium effect size.&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For &lt;span class="math inline">\(\text{f-test}\)&lt;/span> : &lt;strong>0.1=small&lt;/strong>, &lt;strong>0.25=medium&lt;/strong>, and &lt;strong>0.4=large&lt;/strong> correlations.&lt;/p>&lt;/li>
&lt;li>&lt;p>No Tails in ANOVA&lt;/p>&lt;/li>
&lt;li>&lt;p>Groups assumed to be the same size.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>pwr&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>pwr.anova.test(k =, f = , sig.level = , power = )&lt;/em>&lt;/p>
&lt;ul>
&lt;li>k= number of groups&lt;/li>
&lt;li>f= effect size&lt;/li>
&lt;li>sig.level= significant level&lt;/li>
&lt;li>power= power of test&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> ##-- Balanced one-way analysis of variance power calculation --##
Pwr_Anova= pwr.anova.test(k =3 , f =0.25 , sig.level=0.05 , power =0.80 )
#-- Non-parametric Correction --#
print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round((Pwr_Anova$n*1.15),0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :60&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining there is a difference in hours worked across 3 different groups(faculty, staff, and hourly workers). You collect the following trial data of weekly hours.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Groups&lt;/th>
&lt;th align="center">Working Hours&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">Faculty&lt;/td>
&lt;td align="center">42, 45, 46, 55, 42&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">Staff&lt;/td>
&lt;td align="center">46, 45, 37, 42, 40&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="center">Hourly&lt;/td>
&lt;td align="center">29, 42, 33, 50, 23&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining there is a difference in assistant professor salaries across 25 different departments.&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>Here,&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>&lt;span class="math inline">\(\eta^2 = SS_T/TSS=286.5/(286.5+625.2) = 0.314\)&lt;/span>
Note that, you can calculate &lt;span class="math inline">\(SS_T\)&lt;/span> &amp;amp; &lt;span class="math inline">\(TSS\)&lt;/span> by performing ANOVA on the dataset using &lt;em>aov()&lt;/em> function.&lt;/p>&lt;/li>
&lt;li>&lt;p>Effect size&lt;span class="math inline">\((f)\)&lt;/span> = &lt;span class="math inline">\(\sqrt{\eta^2/(1-\eta^2)}=\sqrt{0.314/(1- 0.314)} = 0.677\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>No. of groups= 3&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> ##-- Balanced one-way analysis of variance power calculation --##
Pwr_Anova= pwr.anova.test(k =3, f =0.677, sig.level=0.05, power =0.80)
#-- Non-parametric Correction --#
print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round((Pwr_Anova$n*1.15),0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :9&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>You are interested in determining there is a difference in assistant professor salaries across 25 different departments.&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Guess small effect size (0.10)&lt;/p>&lt;/li>
&lt;li>&lt;p>No. of groups= 25&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> #-- Balanced one-way analysis of variance power calculation --#
Pwr_Anova= pwr.anova.test(k =25, f =0.10, sig.level=0.05, power =0.80)
#-- Non-parametric Correction --#
print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round((Pwr_Anova$n*1.15),0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :104&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="repeated-measures-anova" class="section level2">
&lt;h2>Repeated Measures ANOVA :&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Description:&lt;/strong> : this tests if at least one mean is different among groups, where the groups are repeated measures (more than two) for a normally distributed variable. Repeated Measures ANOVA is the extension of the Paired T-test for more than two groups.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Numeric Var(s)&lt;/th>
&lt;th>Cat. Var(s)&lt;/th>
&lt;th>Cat. Var Group #&lt;/th>
&lt;th>Cat. Var # of interest&lt;/th>
&lt;th>Parametric&lt;/th>
&lt;th>Paired&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>1&lt;/td>
&lt;td>1&lt;/td>
&lt;td>&amp;gt; 2&lt;/td>
&lt;td>1&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Effect size calculation:&lt;/strong> &lt;span class="math display">\[Effect\:Size(f)=\frac{\sigma_m}{\sigma}\]&lt;/span> Where, &lt;span class="math display">\[\sigma_m=\sqrt{\frac{\sum_{j=1}^K{(m_j-m)^2}}{k}}= Standard\:Deviation\:of\:group\:means\]&lt;/span> &lt;span class="math display">\[m_j= j^{th}\:group\:mean\:,\:\:\forall\:j=1(1)K\]&lt;/span> &lt;span class="math display">\[m=Overall\:mean\]&lt;/span> &lt;span class="math display">\[K=number\:of\:groups\]&lt;/span> &lt;span class="math display">\[\sigma=overall\:standard\:deviation\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(1)" class="uri">Example:(1)&lt;/a>&lt;/strong> &lt;strong>Is there a difference in blood pressure at 1, 2, 3, and 4 months post-treatment?&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Here, &lt;span class="math inline">\(H_0\::0\)&lt;/span> and &lt;span class="math inline">\(H_1\:: \neq 0\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>1 group&lt;/strong>, &lt;strong>4 measurements&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>We will guess that the &lt;strong>effect sizes will be small.&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For t-tests: &lt;strong>0.2=small&lt;/strong>, &lt;strong>0.5=medium&lt;/strong>, and &lt;strong>0.8=large&lt;/strong> effect sizes.&lt;/p>&lt;/li>
&lt;li>&lt;p>For the nonsphericity correction coefficient, 1 means sphericity is met. There are methods to estimate this but will go with 1 for this example.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Package:&lt;/strong> &lt;em>WebPower&lt;/em> Package&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R function:&lt;/strong> &lt;em>wp.rmanova(ng = NULL, nm = NULL, f = NULL, nscor = 1, alpha = 0.05, power = NULL, type = 0) &lt;/em>&lt;/p>
&lt;ul>
&lt;li>&lt;p>ng= number of groups&lt;/p>&lt;/li>
&lt;li>&lt;p>nm= number of measurements&lt;/p>&lt;/li>
&lt;li>&lt;p>f= effect size&lt;/p>&lt;/li>
&lt;li>&lt;p>nscor= nonsphericity correction coefficient&lt;/p>&lt;/li>
&lt;li>&lt;p>alpha= significant level of test&lt;/p>&lt;/li>
&lt;li>&lt;p>power= statistical power&lt;/p>&lt;/li>
&lt;li>&lt;p>type= (0,1,2) The value “0” is for between-effect; “1” is for within-effect; and “2” is for interaction effect.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Note:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Within-effects:&lt;/strong> variability of a particular value for individuals in a sample&lt;/li>
&lt;li>&lt;strong>Between-effects:&lt;/strong> examines differences between individuals&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Answer of the problem:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>library(WebPower)
print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(wp.rmanova(n=NULL, ng=1, nm=4, f=0.1, nscor=1, alpha=0.05, power=0.80, type=1)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :1092&amp;quot;&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>&lt;p>Note:&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;a href="https://webpower.psychstat.org/wiki/manual/power_of_rmanova#power_curve">Power analysis for within-effect test&lt;/a>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;a href="https://stattrek.com/anova/repeated-measures/sphericity.aspx">Sphericity and Repeated Measures ANOVA&lt;/a>&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;a href="Example:(2)" class="uri">Example:(2)&lt;/a>&lt;/strong> &lt;strong>Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>(i)&lt;/strong> You are interested in determining if there is a difference in blood serum levels at 6, 12, 18, and 24 months post-treatment. You collect the following trial data of blood serum in mg/dL.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="center">Months&lt;/th>
&lt;th align="center">Blood Serum&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="center">6 Months&lt;/td>
&lt;td align="center">38, 13, 32, 35, 21&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">12 Months&lt;/td>
&lt;td align="center">38, 44, 35, 48, 27&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="center">18 Months&lt;/td>
&lt;td align="center">46, 15, 53, 51, 29&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="center">24 Months&lt;/td>
&lt;td align="center">52, 29, 60, 44, 36&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>&lt;/li>
&lt;li>&lt;p>&lt;strong>(ii)&lt;/strong> You are interested in determining if there is a difference in antibody levels at 1, 2, and 3 months post-treatment.&lt;/p>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;ol style="list-style-type: lower-roman">
&lt;li>Here,&lt;/li>
&lt;/ol>
&lt;p>-Effect Size: &lt;span class="math inline">\(f =\sqrt{\frac{(27.8−37.3)^2+(38.4−37.3)^2+(38.8−37.3)^2+(25.2−37.3)^2}{4}}/ 12.74 = 0.608\)&lt;/span>&lt;/p>
&lt;ul>
&lt;li>To get sphericity, ran ANOVA&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>library(ez)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Warning: package &amp;#39;ez&amp;#39; was built under R version 4.1.3&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Registered S3 methods overwritten by &amp;#39;car&amp;#39;:
## method from
## influence.merMod lme4
## cooks.distance.influence.merMod lme4
## dfbeta.influence.merMod lme4
## dfbetas.influence.merMod lme4&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>data=data.frame(Patient= factor(rep(c(1,2,3,4,5),4)),
Month= factor(c(rep(&amp;quot;6 Months&amp;quot;,5),rep(&amp;quot;12 Months&amp;quot;,5),rep(&amp;quot;18 Months&amp;quot;,5),rep(&amp;quot;24 Months&amp;quot;,5))),
Serum= c(38,13,32,35,21,38,44,35,48,27,46,15,53,51,29,52,29,60,44,36))
anova3= ezANOVA(data, dv=Serum, wid=Patient, within=.(Month),detailed=TRUE)
anova3&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## $ANOVA
## Effect DFn DFd SSn SSd F p p&amp;lt;.05 ges
## 1 (Intercept) 1 4 27825.8 1506.7 73.872171 0.001006882 * 0.9212804
## 2 Month 3 12 706.6 870.9 3.245378 0.060146886 0.2291032
##
## $`Mauchly&amp;#39;s Test for Sphericity`
## Effect W p p&amp;lt;.05
## 2 Month 0.1556327 0.4348287
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]&amp;lt;.05 HFe p[HF] p[HF]&amp;lt;.05
## 2 Month 0.4844127 0.1187469 0.6892662 0.09014564&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>&lt;p>Note: &lt;a href="https://people.umass.edu/bwdillon/LING609/Lectures/Section3/Lecture21.html">For more details about &lt;em>ezANOVA() function&lt;/em> for Sphericity and Repeated Measures ANOVA&lt;/a>&lt;/p>&lt;/li>
&lt;li>&lt;p>Sphericity was non-significant (0.43), so coefficient of 1&lt;/p>&lt;/li>
&lt;li>&lt;p>One group, four measurements, within-effects so type 1&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste0(&amp;quot;The Sample Size is :&amp;quot;,round(wp.rmanova(n=NULL, ng=1, nm=4, f=0.608, nscor=1, alpha=0.05, power=0.80, type=1)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is :31&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;li>&lt;ol start="2" style="list-style-type: lower-roman">
&lt;li>You are interested in determining if there is a difference in antibody levels at 1, 2, and 3 months post-treatment.&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;p>Guess a nonsphericity correction of of 1 and medium effect 0.25&lt;/p>&lt;/li>
&lt;li>&lt;p>One group, three measurements, type 1&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>R Code:&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> print(paste(&amp;quot;The Sample Size is :&amp;quot;,round(wp.rmanova(n=NULL, ng=1, nm=3, f=0.25, nscor=1, alpha=0.05, power=0.80, type=1)$n,0)))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;The Sample Size is : 156&amp;quot;&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div></description></item><item><title>Simulation &amp; Statistics in R</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/</link><pubDate>Tue, 26 Oct 2021 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/</guid><description>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/header-attrs/header-attrs.js">&lt;/script>
&lt;div id="TOC">
&lt;ul>
&lt;li>&lt;a href="#introduction">INTRODUCTION&lt;/a>&lt;/li>
&lt;li>&lt;a href="#concept-of-simulation">Concept Of Simulation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#why-do-we-simulate">Why do we simulate ?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#drawing-of-simple-random-sample">Drawing of Simple Random Sample&lt;/a>&lt;/li>
&lt;li>&lt;a href="#example">Example&lt;/a>&lt;/li>
&lt;li>&lt;a href="#unequal-probability-sampling">Unequal Probability Sampling&lt;/a>&lt;/li>
&lt;li>&lt;a href="#similating-coin-tosses">Similating Coin Tosses&lt;/a>&lt;/li>
&lt;li>&lt;a href="#find-the-proportion-of-heads-tails-in-long-run">Find the Proportion of heads &amp;amp; tails in long run&lt;/a>&lt;/li>
&lt;li>&lt;a href="#find-the-proportion-of-heads-tails-in-long-run-1">Find the Proportion of heads &amp;amp; tails in long run&lt;/a>&lt;/li>
&lt;li>&lt;a href="#finding-probabilities">Finding Probabilities&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#fact">Fact&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#drawing-a-card">Drawing a Card&lt;/a>&lt;/li>
&lt;li>&lt;a href="#divisibility-test">Divisibility Test&lt;/a>&lt;/li>
&lt;li>&lt;a href="#urn-ball-problem">Urn-Ball Problem&lt;/a>&lt;/li>
&lt;li>&lt;a href="#urn-ball-problem-1">Urn-Ball Problem&lt;/a>&lt;/li>
&lt;li>&lt;a href="#birthday-problem">Birthday Problem&lt;/a>&lt;/li>
&lt;li>&lt;a href="#card-shiffting">Card Shiffting&lt;/a>&lt;/li>
&lt;li>&lt;a href="#cut-shuffle">Cut Shuffle&lt;/a>&lt;/li>
&lt;li>&lt;a href="#simulating-a-cut-shuffle">Simulating a Cut Shuffle&lt;/a>&lt;/li>
&lt;li>&lt;a href="#riffle-shuffle">Riffle Shuffle&lt;/a>&lt;/li>
&lt;li>&lt;a href="#simulating-riffle-shuffle">Simulating Riffle Shuffle&lt;/a>&lt;/li>
&lt;li>&lt;a href="#simulating-riffle-shuffle-1">Simulating Riffle Shuffle&lt;/a>&lt;/li>
&lt;li>&lt;a href="#simulating-random-variables">Simulating Random Variables&lt;/a>&lt;/li>
&lt;li>&lt;a href="#using-it-farther">Using it farther&lt;/a>&lt;/li>
&lt;li>&lt;a href="#much-complicated-ones">Much Complicated Ones&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#fact-1">Fact&lt;/a>&lt;/li>
&lt;li>&lt;a href="#algorithm">Algorithm&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#generating-poisson-distribution">Generating Poisson Distribution&lt;/a>&lt;/li>
&lt;li>&lt;a href="#continuous-distributions">Continuous Distributions&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#fact-2">Fact&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#working-with-inbuilt-r-functions">Working with inbuilt R functions&lt;/a>&lt;/li>
&lt;li>&lt;a href="#plotting-the-normal-density">Plotting the normal density&lt;/a>&lt;/li>
&lt;li>&lt;a href="#other-standard-distributions-in-r">Other Standard Distributions in R&lt;/a>&lt;/li>
&lt;li>&lt;a href="#central-limit-theorem">Central Limit Theorem&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#theorem">Theorem&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#law-of-laege-numbers">Law of Laege Numbers&lt;/a>&lt;/li>
&lt;li>&lt;a href="#plotting-the-probability">Plotting the Probability&lt;/a>&lt;/li>
&lt;li>&lt;a href="#strong-law-of-large-numbers">Strong Law of large numbers&lt;/a>&lt;/li>
&lt;li>&lt;a href="#illustrating-strong-law">Illustrating Strong Law&lt;/a>&lt;/li>
&lt;li>&lt;a href="#family-planning">Family Planning&lt;/a>&lt;/li>
&lt;li>&lt;a href="#using-simulation-to-construct-tests">Using Simulation to construct Tests&lt;/a>&lt;/li>
&lt;li>&lt;a href="#plot-of-beta-densities">Plot of Beta Densities&lt;/a>&lt;/li>
&lt;li>&lt;a href="#what-type-of-test-shall-we-perform">What type of test shall we perform ?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#now-lets-find-the-c">Now lets find the c&lt;/a>&lt;/li>
&lt;li>&lt;a href="#generating-normal-variables">Generating Normal Variables&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#fact-box-muller-transformation">Fact: Box-Muller transformation&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#generating-bivariate-normal-variables">Generating Bivariate Normal Variables&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#fact-3">Fact:&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#monte-carlo-simulation">Monte Carlo Simulation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#bias-variance">Bias &amp;amp; Variance&lt;/a>&lt;/li>
&lt;li>&lt;a href="#monte-carlo-integration">Monte Carlo Integration&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#example-1">Example&lt;/a>&lt;/li>
&lt;li>&lt;a href="#another-example">Another Example&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;li>&lt;a href="#an-assignment-problem">An assignment Problem&lt;/a>&lt;/li>
&lt;li>&lt;a href="#brownian-motion">Brownian Motion&lt;/a>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="introduction" class="section level2">
&lt;h2>INTRODUCTION&lt;/h2>
&lt;p>As a statistician, we Want to deal with random experiments. So to do that, thare are various techniques to predict the outcome of such experiments :&lt;/p>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Wait and See:&lt;/strong> Designing winning strategies by trial-and-error method.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Solving Probability Models:&lt;/strong> Assume a definite mathematical model to predict outcome, sometimes gets complicated.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>Simulate Probability Models:&lt;/strong> Also start with a mathematical model, but instead of computing it mathematically we use computers to perform the virtual random experiment following that model and then analyze the artificial data generated by computers. Similar to “wait and see” except that we do not need to wait long reality.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="concept-of-simulation" class="section level2">
&lt;h2>Concept Of Simulation&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Assume a mathematical model.&lt;/p>&lt;/li>
&lt;li>&lt;p>Use computers to perform the random experiment artificially.&lt;/p>&lt;/li>
&lt;li>&lt;p>Computers can do artificial random experiment as computers can generate random numbers.&lt;/p>&lt;/li>
&lt;li>&lt;p>Use the artificial data generated by the computers to analyze the model and predict the outcome.&lt;/p>&lt;/li>
&lt;li>&lt;p>Note that, &lt;em>the random numbers generated by computers are not random in absolute sense, they are only pseudo-random numbers.&lt;/em>&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="why-do-we-simulate" class="section level2">
&lt;h2>Why do we simulate ?&lt;/h2>
&lt;ul>
&lt;li>&lt;p>To have a better understanding of the known probability models.&lt;/p>&lt;/li>
&lt;li>&lt;p>To visualize a probability model with examples of outcome of a random experiment ( &lt;em>which in reality are hard to obtain&lt;/em> )&lt;/p>&lt;/li>
&lt;li>&lt;p>To have an idea about the result of a statistical model which cannot be solved explicitly using formula.&lt;/p>&lt;/li>
&lt;li>&lt;p>To judge the performance a model before applying it to a real data situation.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="drawing-of-simple-random-sample" class="section level2">
&lt;h2>Drawing of Simple Random Sample&lt;/h2>
&lt;ul>
&lt;li>We use the sample() command for both &lt;strong>with-replacement&lt;/strong> &amp;amp; &lt;strong>with-out-replacement&lt;/strong> sampling.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>set.seed(123)
sample(c(&amp;quot;A&amp;quot;,&amp;quot;B&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;D&amp;quot;,&amp;quot;E&amp;quot;),size = 3,replace = F) #-- Without replacement&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;C&amp;quot; &amp;quot;B&amp;quot; &amp;quot;E&amp;quot;&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>set.seed(123)
sample(c(&amp;quot;A&amp;quot;,&amp;quot;B&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;D&amp;quot;,&amp;quot;E&amp;quot;),size = 3,replace = T) #-- With replacement&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;C&amp;quot; &amp;quot;C&amp;quot; &amp;quot;B&amp;quot;&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="example" class="section level2">
&lt;h2>Example&lt;/h2>
&lt;pre class="r">&lt;code>set.seed(5)
sample(1:10,size=2,replace = T)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 2 9&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>set.seed(6)
sample(100,size=5)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 53 10 45 78 56&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="unequal-probability-sampling" class="section level2">
&lt;h2>Unequal Probability Sampling&lt;/h2>
&lt;pre class="r">&lt;code>set.seed(7)
sample(c(&amp;quot;A&amp;quot;,&amp;quot;B&amp;quot;,&amp;quot;C&amp;quot;),size = 2,prob = c(0.1,0.4,0.5))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;A&amp;quot; &amp;quot;C&amp;quot;&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="similating-coin-tosses" class="section level2">
&lt;h2>Similating Coin Tosses&lt;/h2>
&lt;ul>
&lt;li>An unbiased coin is tossed 10 times. Lets see the output of the tosses.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>set.seed(100)
sample(c(&amp;quot;H&amp;quot;,&amp;quot;T&amp;quot;),10,replace = T)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;T&amp;quot; &amp;quot;H&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;H&amp;quot; &amp;quot;H&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;H&amp;quot;&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>Suppose now the probability of head is 2/6&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>set.seed(100)
sample(c(&amp;quot;H&amp;quot;,&amp;quot;T&amp;quot;),10,replace = T,prob = c(2/6,4/6))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;H&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot; &amp;quot;T&amp;quot;&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="find-the-proportion-of-heads-tails-in-long-run" class="section level2">
&lt;h2>Find the Proportion of heads &amp;amp; tails in long run&lt;/h2>
&lt;pre class="r">&lt;code>prop=NULL
size1=seq(100,10000,by=1000)
size2=seq(20000,500000,by=10000)
size=c(size1,size2)
for (n in size)
{
x=sample(0:1,n,rep=T)
prop=c(prop,sum(x)/n)
}
plot(size,prop,type=&amp;quot;l&amp;quot;)
abline(0.5,0)&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="find-the-proportion-of-heads-tails-in-long-run-1" class="section level2">
&lt;h2>Find the Proportion of heads &amp;amp; tails in long run&lt;/h2>
&lt;pre>&lt;code>## [1] 100 1100 2100 3100 4100 5100 6100 7100 8100 9100
## [11] 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000
## [21] 120000 130000 140000 150000 160000 170000 180000 190000 200000 210000
## [31] 220000 230000 240000 250000 260000 270000 280000 290000 300000 310000
## [41] 320000 330000 340000 350000 360000 370000 380000 390000 400000 410000
## [51] 420000 430000 440000 450000 460000 470000 480000 490000 500000&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.4800000 0.5100000 0.4985714 0.4880645 0.5026829 0.4905882 0.5039344
## [8] 0.5021127 0.5004938 0.5057143 0.4978500 0.5019333 0.5033500 0.5027600
## [15] 0.5003167 0.5004000 0.4984125 0.5031000 0.4990700 0.4997818 0.4995667
## [22] 0.4988692 0.5021143 0.5017067 0.5019000 0.4986529 0.5001111 0.5005684
## [29] 0.5001250 0.4984714 0.4992182 0.4990478 0.4965500 0.4987200 0.4986769
## [36] 0.4991741 0.4989179 0.5002103 0.4991067 0.4998323 0.5003156 0.4998909
## [43] 0.4985824 0.4995286 0.5017111 0.5003432 0.4990737 0.5005205 0.4994575
## [50] 0.4997585 0.4988833 0.4997023 0.5001773 0.5009356 0.5003457 0.5004979
## [57] 0.5000729 0.4997633 0.4996940&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-9-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="finding-probabilities" class="section level2">
&lt;h2>Finding Probabilities&lt;/h2>
&lt;div id="fact" class="section level3">
&lt;h3>Fact&lt;/h3>
&lt;p>Probability of any event A can be interpreted as the long term relative frequency of the event A, i.e.,
&lt;span class="math inline">\(\frac{no.\;of\;repetitions\;resulting\;in\;A}{total\;number\;of\;repetitions}\)&lt;/span>
&lt;span class="math inline">\(as\;n\rightarrow\infty\)&lt;/span>&lt;/p>
&lt;ul>
&lt;li>Hence for computing the probability of any event A by simulation, we shall simulate a large number &lt;span class="math inline">\(n\)&lt;/span> of cases and count the number of times the event A has occurred. If this number is &lt;span class="math inline">\(m\)&lt;/span>, then the probability of the event A can be approximated by &lt;span class="math inline">\(\frac{m}{n}\)&lt;/span>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;div id="drawing-a-card" class="section level2">
&lt;h2>Drawing a Card&lt;/h2>
&lt;ul>
&lt;li>A card is drawn from a full pack of 52 cards. Find the probability that the drawn card is a picture card (i.e., king, queen or jack).&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>set.seed(125)
pic=NULL
for(i in 1:10000)
{
x=sample(52,size = 1)
if(any(x%%13==c(11,12,0)))
{
pic[i]=1
}
else pic[i]=0
}
sum(pic)/10000&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.2287&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>Because, &lt;strong>king&lt;/strong>= &lt;span class="math inline">\(11^{th}\)&lt;/span> no. card, &lt;strong>queen&lt;/strong>=&lt;span class="math inline">\(12^{th}\)&lt;/span> no. card,&lt;strong>jack&lt;/strong>=&lt;span class="math inline">\(13^{th}\)&lt;/span> no. card.&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="divisibility-test" class="section level2">
&lt;h2>Divisibility Test&lt;/h2>
&lt;ul>
&lt;li>A number is chosen at random from 1 to 1000. Find the probability that it is divisible by 3, 5 or 6.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>count=0
for(i in 1:100000)
{
num=sample(1000,1)
if(num%%3==0||num%%5==0||num%%6==0)
{
count=count+1
}
}
count/100000&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.46794&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="urn-ball-problem" class="section level2">
&lt;h2>Urn-Ball Problem&lt;/h2>
&lt;ul>
&lt;li>Suppose an urn contains 7 white and 5 black balls. 3 balls are chosen at random without replacement. Find the probability that :
&lt;ul>
&lt;li>all the 3 balls are white&lt;/li>
&lt;li>2 are white and 1 is black.&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="urn-ball-problem-1" class="section level2">
&lt;h2>Urn-Ball Problem&lt;/h2>
&lt;pre class="r">&lt;code>count1=0; count2=0
balls= as.factor(c(rep(&amp;quot;W&amp;quot;,7),rep(&amp;quot;B&amp;quot;,5)))
for ( i in 1:10000)
{
chosen= sample(balls,3)
if (all(chosen==&amp;quot;W&amp;quot;)) count1=count1+1
if (table(chosen)[&amp;quot;W&amp;quot;]==2) count2=count2+1
}
count1/10000; count2/10000&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.1565&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.4859&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="birthday-problem" class="section level2">
&lt;h2>Birthday Problem&lt;/h2>
&lt;ul>
&lt;li>In a class of 25 students, find the probability that at least two students share the same birthday.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>count=0
for(i in 1:500)
{
##-- drawing samples by SRSWR --##
class=sample(365,25,replace = T)
if(length(unique(class))&amp;lt;length(class))
{
count=count+1
}
}
count/500&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.574&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="card-shiffting" class="section level2">
&lt;h2>Card Shiffting&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Often we speak of well-shuffled deck of cards.&lt;/p>&lt;/li>
&lt;li>&lt;p>When we shuffle a deck by hand, the shuffling is always imperfect (not random)&lt;/p>&lt;/li>
&lt;li>&lt;p>We can simulate these imperfect shuffling in computer.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="cut-shuffle" class="section level2">
&lt;h2>Cut Shuffle&lt;/h2>
&lt;ul>
&lt;li>&lt;p>The simplest method is “cutting” the deck.&lt;/p>&lt;/li>
&lt;li>&lt;p>We cut the deck at some random point chosen somewhere around the middle of the deck.&lt;/p>&lt;/li>
&lt;li>&lt;p>Then put the lower part on the top of the upper part.&lt;/p>&lt;/li>
&lt;li>&lt;p>We shall simulate this shuffle.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="simulating-a-cut-shuffle" class="section level2">
&lt;h2>Simulating a Cut Shuffle&lt;/h2>
&lt;pre class="r">&lt;code>cut=function(deck)
{
#choose a random cut point near middle
x=rbinom(1,52,0.5)
temp=c(deck[(x+1):52],deck[1:x])
return(temp)
}
cut(1:52)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
## [26] 49 50 51 52 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## [51] 22 23&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="riffle-shuffle" class="section level2">
&lt;h2>Riffle Shuffle&lt;/h2>
&lt;ul>
&lt;li>&lt;p>A much more reliable way is the rifle shuffle ( &lt;em>also known as Faro shuffle or dovetail shuffle&lt;/em>)&lt;/p>&lt;/li>
&lt;li>&lt;p>First split the deck into two parts just as in the cut method.&lt;/p>&lt;/li>
&lt;li>&lt;p>Take the top half in left hand, and the other half in your right.&lt;/p>&lt;/li>
&lt;li>&lt;p>Release the cards randomly from both the hands.&lt;/p>&lt;/li>
&lt;li>&lt;p>Mathematically, if at any stage there are &lt;span class="math inline">\(a\)&lt;/span> cards in your left hand and &lt;span class="math inline">\(b\)&lt;/span> cards in your right, then the next card comes from the left hand with probability &lt;span class="math inline">\(\frac{a}{a+b}\)&lt;/span> and from the right with probability &lt;span class="math inline">\(\frac{b}{a+b}\)&lt;/span>.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="simulating-riffle-shuffle" class="section level2">
&lt;h2>Simulating Riffle Shuffle&lt;/h2>
&lt;pre class="r">&lt;code>riffle=function(deck)
{
n=length(deck)
x=rbinom(1,52,0.5)
left=deck[1:x]; right=deck[(x+1):52]; k=0;
a=length(left); b=length(right); tab=NULL;
for(i in 1:52 )
{
ind=rbinom(1,1,a/(a+b))
if(ind==1)
{
tab[k+1]=left[a]
left=left[1:(a-1)]
k=k+1; a=a-1
}
else
{
tab[k+1]=right[b]
right=right[1:(b-1)]
k=k+1; b=b-1
}
}
return(tab)
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="simulating-riffle-shuffle-1" class="section level2">
&lt;h2>Simulating Riffle Shuffle&lt;/h2>
&lt;pre class="r">&lt;code>riffle(1:52)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 26 52 25 51 50 49 48 24 23 22 21 20 47 19 18 46 17 16 45 15 44 43 14 42 41
## [26] 40 39 13 12 11 38 10 9 8 7 37 6 36 35 34 33 5 4 32 3 31 30 2 29 28
## [51] 27 1&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="simulating-random-variables" class="section level2">
&lt;h2>Simulating Random Variables&lt;/h2>
&lt;ul>
&lt;li>&lt;p>We can simulate a Uniform(0,1) variable by the command &lt;strong>runif()&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>This can be used to generate random variables from other discrete and continuous as well.&lt;/p>&lt;/li>
&lt;li>&lt;p>Suppose we want to generate a Bernoulli random variable with probability of success 0.7&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>bernoulli=function(prob)
{
u=runif(1); x=NULL;
if(u&amp;lt;prob) x=1
else x=0
return(x)
}
bernoulli(0.7)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="using-it-farther" class="section level2">
&lt;h2>Using it farther&lt;/h2>
&lt;ul>
&lt;li>Suppose we want to simulate a Geometric(0.8) random variable.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>x=1
y=bernoulli(0.8)
while(y!=1)
{
y=bernoulli(0.8)
x=x+1
}
x&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 1&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="much-complicated-ones" class="section level2">
&lt;h2>Much Complicated Ones&lt;/h2>
&lt;ul>
&lt;li>&lt;p>How can we generate poisson or a Hypergeometric random variable using the above technique ?&lt;/p>&lt;/li>
&lt;li>&lt;p>For this we need to take the help of the following fact :&lt;/p>&lt;/li>
&lt;/ul>
&lt;div id="fact-1" class="section level3">
&lt;h3>Fact&lt;/h3>
&lt;p>Suppose we want to generate &lt;span class="math inline">\(X\)&lt;/span> having p.m.f.
&lt;span class="math inline">\(P(X=x_i)=p_i\;\;\forall i=0,1,2,...\;\;\sum{p_i}=1\)&lt;/span>. We generate &lt;span class="math inline">\(U\sim Uni(0,1)\)&lt;/span> and set
&lt;span class="math display">\[X = \left\{ \begin{array}{rcl}
x_0 &amp;amp; if &amp;amp; U&amp;lt;p_0\\ x_1 &amp;amp; if &amp;amp; p_0\leqslant U&amp;lt;{p_0+p_1} \\.&amp;amp;.\\.&amp;amp;.\\x_j &amp;amp; if &amp;amp; \sum^{i-1}_{j=0}p_j\leqslant U&amp;lt;\sum^{i}_{j=0}p_j\\. &amp;amp;.\\. &amp;amp;.\\. &amp;amp;.\end{array}\right.\]&lt;/span>&lt;/p>
&lt;/div>
&lt;div id="algorithm" class="section level3">
&lt;h3>Algorithm&lt;/h3>
&lt;ul>
&lt;li>&lt;p>The preceding fact can be written as :&lt;/p>
&lt;ul>
&lt;li>Generate a random &lt;span class="math inline">\(U\sim U(0,1)\)&lt;/span>&lt;/li>
&lt;li>If &lt;span class="math inline">\(U&amp;lt;p_0\)&lt;/span> stop and set &lt;span class="math inline">\(X=x_0\)&lt;/span>&lt;/li>
&lt;li>If &lt;span class="math inline">\(U&amp;lt;p_0+p_1\)&lt;/span> stop and set &lt;span class="math inline">\(X=x_1\)&lt;/span>&lt;/li>
&lt;li>If &lt;span class="math inline">\(U&amp;lt;p_0+p_1+p_2\)&lt;/span> stop and set &lt;span class="math inline">\(X=x_2\)&lt;/span>&lt;/li>
&lt;li>and so on…&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;div id="generating-poisson-distribution" class="section level2">
&lt;h2>Generating Poisson Distribution&lt;/h2>
&lt;pre class="r">&lt;code>poi_mass=function(x,lambda)
{
return(exp(-lambda)*(lambda^x)/factorial(x))
}
poi_sample=function(lambda)
{
U=runif(1); i=0; cumprob=poi_mass(0,lambda)
while(U&amp;gt;cumprob)
{
i=i+1
cumprob=cumprob+poi_mass(i,lambda)
}
return(i)
}
poi_sample(5)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 2&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="continuous-distributions" class="section level2">
&lt;h2>Continuous Distributions&lt;/h2>
&lt;ul>
&lt;li>For continuous distributions we use the following fact :&lt;/li>
&lt;/ul>
&lt;div id="fact-2" class="section level3">
&lt;h3>Fact&lt;/h3>
&lt;p>(&lt;em>Probability Integral Transformation&lt;/em>) If &lt;span class="math inline">\(X\)&lt;/span> has an absolutely continuous distribution, then the C.D.F of &lt;span class="math inline">\(X\)&lt;/span>, &lt;span class="math inline">\(F(x)\)&lt;/span> has &lt;span class="math inline">\(U(0,1)\)&lt;/span> distribution.&lt;/p>
&lt;ul>
&lt;li>&lt;p>Suppose we want to generate &lt;span class="math inline">\(X\)&lt;/span> from &lt;span class="math inline">\(Exp(\lambda)\)&lt;/span> distribution.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;span class="math display">\[F(x)=1-e^{\lambda x}\;=&amp;gt; U\sim U(0,1)\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;span class="math display">\[X=-\frac{1}{\lambda}ln(1-U)\]&lt;/span> is the required random variable.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;div id="working-with-inbuilt-r-functions" class="section level2">
&lt;h2>Working with inbuilt R functions&lt;/h2>
&lt;ul>
&lt;li>Suppose we want to generate random variables from &lt;span class="math inline">\(N(\mu,{\sigma}^2)\)&lt;/span>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code># 2 samples from N(5,2) Distribution
rnorm(n=2,mean=5,sd=sqrt(2))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 7.280889 6.274643&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>Now let us find &lt;span class="math inline">\(P(X\leq x)\)&lt;/span> i.e., &lt;span class="math inline">\(\Phi(\frac{x-\mu}{\sigma})\)&lt;/span>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code># P(X&amp;lt;=4) for N(5,2)
pnorm(4,mean=5,sd=sqrt(2))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.2397501&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code># P(X&amp;gt;7) for N(5,2)
pnorm(7,mean=5,sd=sqrt(2),lower.tail = F)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.0786496&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>We can also compute the normal quantiles &lt;span class="math inline">\(z_\alpha\)&lt;/span>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code># lower 0.05 point
qnorm(0.05,mean=5,sd=sqrt(2))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 2.673826&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code># lower 0.01 point
qnorm(0.01,mean=5,sd=sqrt(2),lower.tail = F)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 8.289953&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>Can also compute normal density &lt;span class="math inline">\(\phi(\frac{x-\mu}{\sigma})\)&lt;/span>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code># density at x=2
dnorm(2,mean = 5,sd=sqrt(2))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.02973257&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code># density at x=5
dnorm(5,mean=5,sd=sqrt(2))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.2820948&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="plotting-the-normal-density" class="section level2">
&lt;h2>Plotting the normal density&lt;/h2>
&lt;pre class="r">&lt;code>x=seq(-3,3,by=0.01); y=dnorm(x,0,1)
plot(x,y,type=&amp;quot;l&amp;quot;,main=&amp;quot;Density of N(0,1)&amp;quot;,ylab=expression(phi(x)))&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-28-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="other-standard-distributions-in-r" class="section level2">
&lt;h2>Other Standard Distributions in R&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th>Distribution&lt;/th>
&lt;th align="left">Sample&lt;/th>
&lt;th align="left">P(X&amp;lt;=x)&lt;/th>
&lt;th align="center">z_alpha&lt;/th>
&lt;th align="center">Density&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>Binomial&lt;/td>
&lt;td align="left">rbinom(n,size,prob)&lt;/td>
&lt;td align="left">pbinom&lt;/td>
&lt;td align="center">qbinom&lt;/td>
&lt;td align="center">dbinom&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Poisson&lt;/td>
&lt;td align="left">rpois(n,lambda)&lt;/td>
&lt;td align="left">ppois&lt;/td>
&lt;td align="center">qpois&lt;/td>
&lt;td align="center">dpois&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Neg.Binomial&lt;/td>
&lt;td align="left">rnbinom(n,size,prob,mu)&lt;/td>
&lt;td align="left">pnbinom&lt;/td>
&lt;td align="center">qnbinom&lt;/td>
&lt;td align="center">dnbinom&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Geometric&lt;/td>
&lt;td align="left">rgeom(n,prob)&lt;/td>
&lt;td align="left">pgeom&lt;/td>
&lt;td align="center">qgeom&lt;/td>
&lt;td align="center">dgeom&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Hypergeometric&lt;/td>
&lt;td align="left">rhyper(nn,m,n,k)&lt;/td>
&lt;td align="left">phyper&lt;/td>
&lt;td align="center">qhyper&lt;/td>
&lt;td align="center">dhyper&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Uniform&lt;/td>
&lt;td align="left">runif(n,min=0,max=1)&lt;/td>
&lt;td align="left">punif&lt;/td>
&lt;td align="center">qunif&lt;/td>
&lt;td align="center">dunif&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Exponential&lt;/td>
&lt;td align="left">rexp(n,rate=1)&lt;/td>
&lt;td align="left">pexp&lt;/td>
&lt;td align="center">qexp&lt;/td>
&lt;td align="center">dexp&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Cauchy&lt;/td>
&lt;td align="left">rcauchy(n,location=0,scale=1)&lt;/td>
&lt;td align="left">pcauchy&lt;/td>
&lt;td align="center">qcauchy&lt;/td>
&lt;td align="center">dcauchy&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>t&lt;/td>
&lt;td align="left">rt(n,df,ncp)&lt;/td>
&lt;td align="left">pt&lt;/td>
&lt;td align="center">qt&lt;/td>
&lt;td align="center">dt&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>F&lt;/td>
&lt;td align="left">rf(n,df1,df2,ncp)&lt;/td>
&lt;td align="left">pf&lt;/td>
&lt;td align="center">qf&lt;/td>
&lt;td align="center">df&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Chi-Square&lt;/td>
&lt;td align="left">rchisq(n,df,ncp=0)&lt;/td>
&lt;td align="left">pchisq&lt;/td>
&lt;td align="center">qchisq&lt;/td>
&lt;td align="center">dchisq&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Gamma&lt;/td>
&lt;td align="left">rgamma(n,shape,rate,schale)&lt;/td>
&lt;td align="left">pgamma&lt;/td>
&lt;td align="center">qgamma&lt;/td>
&lt;td align="center">dgamma&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Beta&lt;/td>
&lt;td align="left">rbeta(n,shape1,shape2,ncp)&lt;/td>
&lt;td align="left">pbeta&lt;/td>
&lt;td align="center">qbeta&lt;/td>
&lt;td align="center">dbeta&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Multinommial&lt;/td>
&lt;td align="left">rmultinom(n,size,prob)&lt;/td>
&lt;td align="left">-&lt;/td>
&lt;td align="center">-&lt;/td>
&lt;td align="center">dmultinom&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Mult.Normal&lt;/td>
&lt;td align="left">rmnnorm(n,mean,sigma)&lt;/td>
&lt;td align="left">-&lt;/td>
&lt;td align="center">-&lt;/td>
&lt;td align="center">dmvnorm&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div id="central-limit-theorem" class="section level2">
&lt;h2>Central Limit Theorem&lt;/h2>
&lt;div id="theorem" class="section level3">
&lt;h3>Theorem&lt;/h3>
&lt;ul>
&lt;li>&lt;p>(&lt;em>iid case&lt;/em>) Let &lt;span class="math inline">\(X_1,X_2,...,X_n\)&lt;/span> be &lt;em>iid&lt;/em> random variables with mean &lt;span class="math inline">\(\mu\)&lt;/span> and variance &lt;span class="math inline">\({\sigma}^2&amp;lt;\infty\)&lt;/span> and &lt;span class="math display">\[S_n=X_1+X_2+...+X_n\]&lt;/span>. Then, &lt;span class="math inline">\(\frac{S_n-E(S_n)}{\sqrt{Var(S_n)}}\longrightarrow N(0,1)\)&lt;/span> as &lt;span class="math inline">\(n\longrightarrow \infty\)&lt;/span>.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;span class="math inline">\(U_1,U_2,...,U_n\)&lt;/span> are &lt;em>iid&lt;/em> &lt;span class="math inline">\(U(0,1)\)&lt;/span> variables. Then
&lt;span class="math display">\[Z_n=\frac{U_1+U_2+...+U_n-\frac{n}{2}}{\sqrt{\frac{n}{12}}}\longrightarrow N(0,1)\;\;;\;as\;\;n\longrightarrow \infty\]&lt;/span>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>n=100
k=10000
U=runif(n*k)
M=matrix(U,n,k)
X=apply(M,2,sum)
Z=(X-n/2)/sqrt(n/12)
par(mfrow=c(1,2))
hist(Z)
qqnorm(Z)
qqline(Z,col=&amp;quot;red&amp;quot;)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-29-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;/div>
&lt;div id="law-of-laege-numbers" class="section level2">
&lt;h2>Law of Laege Numbers&lt;/h2>
&lt;ul>
&lt;li>&lt;p>&lt;strong>The weak Law of large numbers&lt;/strong> says that for any &lt;span class="math inline">\(\epsilon&amp;gt;0\)&lt;/span> the sequence of probabilities &lt;span class="math display">\[P({|\frac{S_n}{n}-\mu|&amp;lt;\epsilon})\longrightarrow 1\;\;\;\;\;as \;\;n\longrightarrow \infty\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Consider i.i.d. coin flips, that is, Bernoulli trials with &lt;span class="math inline">\(p=\mu=\frac{1}2\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>We find the &lt;span class="math inline">\(P({|\frac{S_n}{n}-\mu|&amp;lt;\epsilon})\)&lt;/span> in R and illustrate the limiting behavior, with &lt;span class="math inline">\(\epsilon=0.01\)&lt;/span>&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="plotting-the-probability" class="section level2">
&lt;h2>Plotting the Probability&lt;/h2>
&lt;pre class="r">&lt;code>wlln=function(n,eps,p)
{
pbinom(n*p+n*eps,n,p)-pbinom(n*p-n*eps,n,p)
}
prob=NULL
for(n in 1:10000)
{
prob[n]=wlln(n,eps=0.01,p=0.5)
}
plot(prob,type=&amp;quot;l&amp;quot;,xlab=&amp;quot;n&amp;quot;,ylab=expression(P(X&amp;lt;=x)))&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-30-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="strong-law-of-large-numbers" class="section level2">
&lt;h2>Strong Law of large numbers&lt;/h2>
&lt;ul>
&lt;li>&lt;p>The strong law of large numbers says that &lt;span class="math inline">\(\frac{S_n}n \longrightarrow \mu\;\;\;w.p.\;1\;;\;as\;\;n\longrightarrow \infty\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Consider i.i.d. coin flips, that is, Bernoulli trials with &lt;span class="math inline">\(p=\mu=\frac{1}2\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>The sum &lt;span class="math inline">\(S_n\)&lt;/span> is a Binomial random variable.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="illustrating-strong-law" class="section level2">
&lt;h2>Illustrating Strong Law&lt;/h2>
&lt;pre class="r">&lt;code>slln=function(n,p)
{
x=rbinom(1,size=n,prob=p)
return(x)
}
value=NULL
for(i in 1:10000)
{
value[i]=slln(i,0.5)/i
}
plot(value,type=&amp;quot;l&amp;quot;,xlab=&amp;quot;n&amp;quot;,ylab=&amp;quot;Sample mean&amp;quot;)
abline(h=0.5)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-31-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="family-planning" class="section level2">
&lt;h2>Family Planning&lt;/h2>
&lt;ul>
&lt;li>Suppose a couple is planning to have children until they have one child of each sex. Assuming male and female child are equally probable, how may children can they expect to have ?&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>count=NULL
for(i in 1:1000)
{
child=sample(c(0,1),1)
while(length(unique(child))&amp;lt;2)
{
child=c(child,sample(c(0,1),1))
}
count[i]=length(child)
}
mean(count)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 3.08&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="using-simulation-to-construct-tests" class="section level2">
&lt;h2>Using Simulation to construct Tests&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Simulation can be used to construct tests in situations where the &lt;strong>exact sampling distribution&lt;/strong> of the test statistic is hard to find even under the null hypothesis.&lt;/p>&lt;/li>
&lt;li>&lt;p>More specially we use simulation to find the &lt;span class="math inline">\(\alpha100\)&lt;/span>% cut-off points&lt;/p>&lt;/li>
&lt;li>&lt;p>Suppose we have a sample of 101 from Beta(5,b)&lt;/p>&lt;/li>
&lt;li>&lt;p>We want to test &lt;span class="math inline">\(H_0:b=5\)&lt;/span> against &lt;span class="math inline">\(H_1:b&amp;lt;5\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>To get an idea about the nature of the test we plot the density function for different values of b.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="plot-of-beta-densities" class="section level2">
&lt;h2>Plot of Beta Densities&lt;/h2>
&lt;pre class="r">&lt;code>par(mfrow=c(1,3))
x=seq(0,1,by=0.01)
y1=dbeta(x,shape1 = 5,shape2 = 2)
y2=dbeta(x,shape1 = 5,shape2 = 5)
y3=dbeta(x,shape1 = 5,shape2 = 10)
plot(x,y1,type=&amp;quot;l&amp;quot;,xlab=&amp;quot;b&amp;lt;5&amp;quot;)
abline(v=median(rbeta(101,shape1 = 5,shape2 = 2)),col=&amp;quot;red&amp;quot;)
plot(x,y2,type=&amp;quot;l&amp;quot;,xlab=&amp;quot;b=5&amp;quot;)
abline(v=median(rbeta(101,shape1=5,shape2=5)),col=&amp;quot;red&amp;quot;)
plot(x,y3,type=&amp;quot;l&amp;quot;,xlab=&amp;quot;b&amp;gt;5&amp;quot;)
abline(v=median(rbeta(101,shape1=5,shape2=10)),col=&amp;quot;red&amp;quot;)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-33-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="what-type-of-test-shall-we-perform" class="section level2">
&lt;h2>What type of test shall we perform ?&lt;/h2>
&lt;ul>
&lt;li>&lt;p>From the figures we see that sample median can be used as a
test statistic.&lt;/p>&lt;/li>
&lt;li>&lt;p>Also a right tailed test based on the median will be appropriate&lt;/p>&lt;/li>
&lt;li>&lt;p>Thus we shall reject &lt;span class="math inline">\(H_0\)&lt;/span> if the sample median exceeds some
value &lt;span class="math inline">\(c\)&lt;/span>.&lt;/p>&lt;/li>
&lt;li>&lt;p>We want to find the test at 90% level of signficance.&lt;/p>&lt;/li>
&lt;li>&lt;p>We shall use simulation technique to find the cut-off point &lt;span class="math inline">\(c\)&lt;/span>.&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>set.seed(100);prob=NULL; j=1
C=seq(0.2,0.9,by=0.001)
for ( c in C)
{
prob[j]=0
for(i in 1:100)
{
x=rbeta(101,shape1=5,shape2=5)
me=median(x)
if(me&amp;gt;c) prob[j]=prob[j]+1
}
prob[j]=prob[j]/100
j=j+1
}
plot(C,prob,type=&amp;quot;l&amp;quot;)
abline(h=0.9,col=&amp;quot;red&amp;quot;)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-34-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="now-lets-find-the-c" class="section level2">
&lt;h2>Now lets find the c&lt;/h2>
&lt;p>We continue to search the c for which &lt;span class="math inline">\(P_{H_0}(me&amp;gt;c)\)&lt;/span> is closet to 0.9&lt;/p>
&lt;pre class="r">&lt;code>C[which(prob&amp;gt;0.89 &amp;amp; prob&amp;lt;0.91)]&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.473 0.475&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>prob[which(C %in% C[which(prob&amp;gt;0.89 &amp;amp; prob&amp;lt;0.91)])]&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 0.9 0.9&lt;/code>&lt;/pre>
&lt;p>So, we can take c to be 0.473
Thus our test rule is: Reject &lt;span class="math inline">\(H_0\)&lt;/span> if sample median exceeds 0.473&lt;/p>
&lt;/div>
&lt;div id="generating-normal-variables" class="section level2">
&lt;h2>Generating Normal Variables&lt;/h2>
&lt;ul>
&lt;li>Instead of using R inbuilt function, &lt;em>rnorm()&lt;/em>, we can generate Normal variables from scratch.&lt;/li>
&lt;/ul>
&lt;div id="fact-box-muller-transformation" class="section level3">
&lt;h3>Fact: Box-Muller transformation&lt;/h3>
&lt;p>let, &lt;span class="math inline">\(U_1\)&lt;/span>,&lt;span class="math inline">\(U_2\)&lt;/span>&lt;span class="math inline">\(\sim U(0,1)\)&lt;/span>&lt;/p>
&lt;p>&lt;span class="math display">\[Z_1=\sqrt{-2 ln U_1} cos(2\pi U_2)\]&lt;/span>
&lt;span class="math display">\[Z_2=\sqrt{-2 ln U_1} sin(2\pi U_2)\]&lt;/span>&lt;/p>
&lt;p>Then, &lt;span class="math inline">\(Z_1,Z_2 \sim N(0,1)\)&lt;/span> independently.&lt;/p>
&lt;p>So, to generate &lt;span class="math inline">\(Y \sim N(\mu , {\sigma}^2)\)&lt;/span>; we use, &lt;span class="math inline">\(Y=\mu +\sigma Z\)&lt;/span> where, &lt;span class="math inline">\(Z \sim N(0,1)\)&lt;/span>.&lt;/p>
&lt;pre class="r">&lt;code>normal=function(n)
{
U1 = runif(n)
U2 = runif(n)
C = sqrt(-2*log(U1))
Z1 = C*cos(2*pi*U2)
Z2 = C*sin(2*pi*U2)
return(Z1)
}
n = 100000
Z = normal(n)
hist(Z,prob=T,col=rainbow(12))
#-- note that, rainbow() is a graphical function, which gives multiple color.
curve(dnorm(x),-3,3,
add=T,lwd=4)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-37-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;/div>
&lt;div id="generating-bivariate-normal-variables" class="section level2">
&lt;h2>Generating Bivariate Normal Variables&lt;/h2>
&lt;div id="fact-3" class="section level3">
&lt;h3>Fact:&lt;/h3>
&lt;p>If &lt;span class="math inline">\((X,Y) \sim N_2(0,0,1,1,\rho)\)&lt;/span>, then &lt;span class="math inline">\(Z_1=X\)&lt;/span> and &lt;span class="math inline">\(Z_2=\frac{(Y-\rho X)}{\sqrt{1-{\rho}^2}}\)&lt;/span> are iid &lt;span class="math inline">\(N(0,1)\)&lt;/span>, where, &lt;span class="math inline">\(-1&amp;lt;\rho&amp;lt;1\)&lt;/span>&lt;/p>
&lt;p>Equivalently, if we generate &lt;span class="math inline">\(Z_1\)&lt;/span>,&lt;span class="math inline">\(Z_2\)&lt;/span> iid &lt;span class="math inline">\(N(0,1)\)&lt;/span>, then setting &lt;span class="math inline">\(X=Z_1\)&lt;/span> and &lt;span class="math inline">\(Y=(1-{\rho}^2)Z_2+\rho Z_1\)&lt;/span> gives a pair of random variables that have &lt;span class="math inline">\(N_2(0,0,1,1,\rho)\)&lt;/span> distribution.&lt;/p>
&lt;pre class="r">&lt;code>binorm=function(n,rho)
{
x = numeric(n); y&amp;lt;- numeric(n)
for (i in 1:n)
{
z1 = normal(1)
z2 = normal(1)
x[i] = z1
y[i] = rho*z1+sqrt(1-rho^2)*z2
}
return(cbind(x,y))
}
n = 1000 ;rho = -0.5
data = binorm(n,rho)
##-- Plotting Bivariate Normal Data --##
plot(data,
pch=19,
xlab=&amp;quot;X&amp;quot;,
ylab = &amp;quot;Y&amp;quot;)
abline(lm(data[,2]~data[,1]),col=&amp;quot;red&amp;quot;,
v=mean(data[,1]),
h=mean(data[,2]),
lwd=3)
legend(&amp;quot;topright&amp;quot;,legend = c(paste(&amp;quot;mean(X)= &amp;quot;,round(mean(data[,1]),3)),
paste(&amp;quot;Var(X)= &amp;quot;,round((sd(data[,1]))^2,3)),
paste(&amp;quot;mean(Y)= &amp;quot;,round(mean(data[,2]),3)),
paste(&amp;quot;Var(Y)= &amp;quot;,round((sd(data[,2]))^2,3)),
paste(&amp;quot;samp corr.= &amp;quot;,round(cor.test(data[,1],data[,2])$estimate,2))),
cex=0.66)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-38-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;/div>
&lt;div id="monte-carlo-simulation" class="section level2">
&lt;h2>Monte Carlo Simulation&lt;/h2>
&lt;ul>
&lt;li>&lt;p>We know that sample average &lt;span class="math inline">\(\bar{x}\)&lt;/span> converges to population mean by consistency property.&lt;/p>&lt;/li>
&lt;li>&lt;p>Thus expected value of any function can be approximated by the sample average.&lt;/p>&lt;/li>
&lt;li>&lt;p>Thus &lt;span class="math inline">\(\frac{1}{N} \sum_{i=1}^N{f(X_i)} \longrightarrow E(f(X))\)&lt;/span> with probability 1 as &lt;span class="math inline">\(N \longrightarrow \infty\)&lt;/span> if &lt;span class="math inline">\(X_1,X_2,...\)&lt;/span> are iid sequence of random variables with the same distribution as &lt;span class="math inline">\(X\)&lt;/span>.&lt;/p>&lt;/li>
&lt;li>&lt;p>A Monte Carlo method for estimating &lt;span class="math inline">\(E(f(X))\)&lt;/span> is a numerical method based on the approximation &lt;span class="math display">\[Z_N^{MC}=E[f(X)]\approx \frac{1}{N} \sum_{i=1}^N(f(X_1))\]&lt;/span> where &lt;span class="math inline">\(X_1,X_2,...\)&lt;/span> are iid sequence of random variables with same distribution as &lt;span class="math inline">\(X\)&lt;/span>.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="bias-variance" class="section level2">
&lt;h2>Bias &amp;amp; Variance&lt;/h2>
&lt;ul>
&lt;li>The Monte Carlo estimate &lt;span class="math inline">\(Z_N^{MC}\)&lt;/span> for &lt;span class="math inline">\(E(f(X))\)&lt;/span>, has &lt;span class="math display">\[bias(Z_N^{MC})=0\]&lt;/span> and &lt;span class="math display">\[MSE(Z_N^{MC})=Var(Z_N^{MC})=\frac{1}{N} Var(f(X))\]&lt;/span>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="monte-carlo-integration" class="section level2">
&lt;h2>Monte Carlo Integration&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Consider the integral &lt;span class="math inline">\({\int}_a^bf(x)dx\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Objective is to approximate this integral&lt;/p>&lt;/li>
&lt;li>&lt;p>Let &lt;span class="math inline">\(X_1,X_2,...\)&lt;/span> be iid &lt;span class="math inline">\(U(a,b)\)&lt;/span>, i.e., density of &lt;span class="math inline">\(X_j\)&lt;/span> is &lt;span class="math inline">\(\phi(x)=\frac{1}{b-a}I_{[a,b]}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Then &lt;span class="math display">\[{\int}_a^bf(x)dx=(b-a){\int}_a^bf(x) \phi(x)dx=(b-a)E(f(X))\approx \frac{b-a}{N} \sum_{i=1}^N{f(X_j)}\]&lt;/span> for large &lt;span class="math inline">\(N\)&lt;/span>&lt;/p>&lt;/li>
&lt;/ul>
&lt;div id="example-1" class="section level3">
&lt;h3>Example&lt;/h3>
&lt;ul>
&lt;li>&lt;p>Evaluate the integral &lt;span class="math inline">\({\int}_0^{2\pi} e^{k\:cos(x)}dx\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>We, grnerate samples &lt;span class="math inline">\(X_j\)&lt;/span> from &lt;span class="math inline">\(U(0,2\pi)\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Then use the approximation &lt;span class="math display">\[{\int}_0^{2\pi} e^{k\:cos(x)}dx \approx \frac{2\pi}{N} \sum_{j=1}{e^{k\:cox(X_j)}}\]&lt;/span>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>set.seed(123)
N = 1000
x = runif(N,min=0,max=(2*pi))
value = sum(exp(cos(x)))
value = (2*pi)*value/N
value&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 7.901431&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="another-example" class="section level3">
&lt;h3>Another Example&lt;/h3>
&lt;ul>
&lt;li>&lt;p>&lt;strong>Problem:&lt;/strong> Generate the c.d.f of &lt;span class="math inline">\(N(0,1)\)&lt;/span> for several values of the argument and then compare its accuracy.&lt;/p>&lt;/li>
&lt;li>&lt;p>The normal c.d.f can be expressed as, &lt;span class="math display">\[\Phi(t)= {\int}_{-\infty}^t \frac{1}{\sqrt{2\pi}}e^{\frac{x^2}{2}}dx\]&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>We shall use Monte Carlo method to estimate &lt;span class="math inline">\(\phi(t)\)&lt;/span> as, &lt;span class="math display">\[\phi{(t)} \approx \frac{1}{n} \sum_{i=1}^n{I(X_i \le t)}\]&lt;/span> where, &lt;span class="math inline">\(I(X_i \le t)=\)&lt;/span> 1 or 0 with prob. &lt;span class="math inline">\(\phi(t)\)&lt;/span> or &lt;span class="math inline">\(1-\phi(t)\)&lt;/span> respectively, and &lt;span class="math inline">\(X_i\)&lt;/span>’s are random samples from &lt;span class="math inline">\(N(0,1)\)&lt;/span>.&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>n = 1000
t = seq(-3,3,0.01)
x = NULL
phi.hat=NULL
phi=NULL
for(i in 1:length(t))
{
x = rnorm(n)
s = sum(x&amp;lt;=t[i])
phi.hat[i] = s/n
phi[i] = pnorm(t[i])
}
par(mfrow = c(1,2))
plot(t,phi,main=&amp;quot;Original c.d.f&amp;quot;,col=&amp;quot;red&amp;quot;,pch=19)
plot(t,phi.hat,main=&amp;quot;Estimated c.d.f&amp;quot;,col=&amp;quot;blue&amp;quot;,pch=19)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-40-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;/div>
&lt;div id="an-assignment-problem" class="section level2">
&lt;h2>An assignment Problem&lt;/h2>
&lt;p>&lt;strong>Here, we have to do :&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;p>Draw a random sample of size 50 from &lt;span class="math inline">\(N(1,2)\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Draw another radom sample of size 1000 from the same distribution &lt;span class="math inline">\(N(1,2)\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Calculate the test statistic :&lt;span class="math inline">\(T_n=\frac{\sqrt{n}(\bar{X_n}-1)}{s_n}\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Repeat this 1000 times.&lt;/p>&lt;/li>
&lt;li>&lt;p>Draw histograms of the &lt;span class="math inline">\(T_n\)&lt;/span>’s coming from two different samples of the same population &lt;span class="math inline">\(N(0,1)\)&lt;/span>&lt;/p>&lt;/li>
&lt;li>&lt;p>Copare these two histograms with the Standard Normal distribution.&lt;/p>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Solution:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>here, to plot histograms, instead of using the famous &lt;em>ggplot2&lt;/em> package, I am using the basic R plotting function.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>##--- Creating a Function to simulate 1000 test statistics for two different samples
simulation=function(len_1,len_2){
A=NULL
B=NULL
for(i in 1:1000)
{
Sam_1=rnorm(len_1,1,sqrt(2))
Sam_2=rnorm(len_2,1,sqrt(2))
Tn_1=(sqrt(length(Sam_1))*((sum(Sam_1)/length(Sam_1))-1))/sqrt(var(Sam_1))
Tn_2=(sqrt(length(Sam_2))*((sum(Sam_2)/length(Sam_2))-1))/sqrt(var(Sam_2))
A=c(A,Tn_1)
B=c(B,Tn_2)
}
Mat=as.data.frame(matrix(c(A,B),ncol=2,byrow = F))
names(Mat)=c(&amp;quot;Tn_1&amp;quot;,&amp;quot;Tn_2&amp;quot;)
return(Mat)
}
X=(simulation(50,1000)) # Data Table
X[1:10,] # Showing 1st 10 samples of the Data Table&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Tn_1 Tn_2
## 1 0.8020946 0.20259551
## 2 -0.8509208 -1.40945355
## 3 -0.2805664 -1.05684656
## 4 -0.6615001 -0.44404425
## 5 2.3164142 -0.09538437
## 6 -2.3255364 -0.42208845
## 7 -0.2363790 0.60943958
## 8 -2.8176113 -0.40001562
## 9 -0.2177217 -2.25709292
## 10 -0.0575885 2.26107062&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code># Writing a message
writeLines(paste(c(&amp;quot;Omitting&amp;quot;,&amp;quot;the&amp;quot;,&amp;quot;rest&amp;quot;,&amp;quot;990&amp;quot;,&amp;quot;values&amp;quot;)),sep=&amp;quot; &amp;quot;)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Omitting the rest 990 values&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>##--- Histogram of the 1st Sample
hist(X$Tn_1,
col=&amp;quot;red&amp;quot;,
xlab=&amp;quot;Tn&amp;quot;,
ylab=&amp;quot;Frequency&amp;quot;,
main=&amp;quot;For the Sample 1&amp;quot;,
density = 50)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-41-1.png" width="672" />&lt;/p>
&lt;pre class="r">&lt;code>##--- Histogram of the 2nd Sample
hist(X$Tn_2,
col=12,
xlab=&amp;quot;Tn&amp;quot;,
ylab=&amp;quot;&amp;quot;,
main=&amp;quot;For the Sample 2&amp;quot;,
density = 40)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-41-2.png" width="672" />&lt;/p>
&lt;pre class="r">&lt;code>##--- Preparing the density of the N(0,1)
a=seq(-3,3,by= 0.01) # Range of the sample points for N(0,1)
b=dnorm(a) # density of the N(0,1) for the above range.
##--- Comparing Plots :
hist(X$Tn_1, # histogram for the Sample 1
col=&amp;quot;red&amp;quot;,
xlab=&amp;quot;Tn&amp;quot;,
ylab=&amp;quot;Frequency&amp;quot;,
main=&amp;quot;Comparing Two Histograms coming from two\n different Samples of the same \nPopulation N(1,2) with the Standard Normal density&amp;quot;,
density = 50,
axes=F,
cex=4)
par(new=T) # For Overlap the new plot
hist(X$Tn_2,
col=12,
xlab=&amp;quot;&amp;quot;,
ylab=&amp;quot;&amp;quot;, # histogram for the Sample 2
main=&amp;quot;&amp;quot;,
density = 40,
axes = F)
par(new=T) # For Overlap the new plot
plot(a,b,
type =&amp;quot;l&amp;quot;,
xlab=&amp;quot;&amp;quot;,
ylab=&amp;quot;&amp;quot;, # Density curve of the N(0,1)
main = &amp;quot;&amp;quot;,
axes = F,
col=&amp;quot;darkgreen&amp;quot;,
lwd=3)
# Adding Legend
legend(&amp;quot;topright&amp;quot;,
legend = c(&amp;quot;Sample 1&amp;quot;,&amp;quot;Sample 2&amp;quot;,&amp;quot;PDF-N(0,1)&amp;quot;),
fil=c(&amp;quot;red&amp;quot;,20,&amp;quot;darkgreen&amp;quot;),
cex=0.6)
# Adding box
box()&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-41-3.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="brownian-motion" class="section level2">
&lt;h2>Brownian Motion&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Note that the concepts on basics of Monte Carlo Simulation and various Random Distributions have been introduced lets focus on using Monte Carlo methods to simulate paths for various &lt;a href="https://en.wikipedia.org/wiki/Stochastic_process">Stochastic Processes.&lt;/a>&lt;/p>&lt;/li>
&lt;li>&lt;p>Standard Brownian Motion on &lt;span class="math inline">\([0,T]\)&lt;/span> is a Stochastic Process &lt;span class="math inline">\((W(t),0\leq t \leq T)\)&lt;/span> which satisfies some properties such as&lt;/p>
&lt;ul>
&lt;li>&lt;span class="math inline">\(W(0) = 0\)&lt;/span>&lt;/li>
&lt;li>For any &lt;span class="math inline">\(k\)&lt;/span> and any &lt;span class="math inline">\(0 \leq t_1 \leq .... \leq T\)&lt;/span>, the increments between any two successive &lt;span class="math inline">\(W(t_i)-W(t_{i-1})\)&lt;/span> are independent.&lt;/li>
&lt;li>The difference &lt;span class="math inline">\(W(t)-W(s) \sim N(0,t-s)\)&lt;/span> for any &lt;span class="math inline">\(0\leq s&amp;lt;t \leq T\)&lt;/span>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;p>As a consequence of 1st and 2nd, &lt;span class="math inline">\(W(t) \sim N(0,t)\)&lt;/span>.&lt;/p>
&lt;p>On the other hand, Brownian Motion which is non-standard will have two parameters just like Normal Distribution known as &lt;strong>drift&lt;/strong> and &lt;strong>diffusion&lt;/strong>. Using &lt;span class="math inline">\(W(t)\)&lt;/span> we therefore give a &lt;strong>Stochastic Differential&lt;/strong> Equation for any Brownian Motion&lt;/p>
&lt;p>&lt;span class="math display">\[Brownian\:motion\:with\:drift\: \mu \:and\:\:diffusion\:\:coefficient\: \sigma^2\:\:through\:\:the\;SDE\]&lt;/span> &lt;span class="math display">\[dX(t)=\mu(t)dt+\sigma(t)dW(t)\]&lt;/span>&lt;/p>
&lt;p>&lt;strong>Sample Paths Generations&lt;/strong>&lt;/p>
&lt;p>Solving the SDE presented above we can write the equation in terms of &lt;span class="math inline">\(X(t_i),\mu(s),\sigma(s)\)&lt;/span> &lt;span class="math display">\[X(t_{i+1})=X(t_i)+\int_{t_i}^{t_{i+1}}\mu(s)ds+\sqrt{\int_{t_i}^{t_{i+1}}{\sigma^2(u)du Z_{i+1}}}\]&lt;/span> Hence let us look at the code to generate paths where I have assumed &lt;span class="math inline">\(\mu\)&lt;/span> and &lt;span class="math inline">\(\sigma\)&lt;/span> to be constant.&lt;/p>
&lt;pre class="r">&lt;code>Brownian = function() # This is a function to generate Browninan with drift 0.04 and diffusion 0.7
{
paths = 10
count = 5000
interval = 5/count
sample = matrix(0,nrow=(count+1),ncol=paths)
for(i in 1:paths)
{
sample[1,i] = 5
for(j in 2:(count+1))
{
sample[j,i] = sample[j-1,i]+interval*0.04+((interval)^.5)*rnorm(1,0,1)*0.7
}
}
cat(&amp;quot;E[W(2)] = &amp;quot;,mean(sample[2001,]),&amp;quot;\n&amp;quot;)
cat(&amp;quot;E[W(5)] = &amp;quot;,mean(sample[5001,]),&amp;quot;\n&amp;quot;)
matplot(sample,main=&amp;quot;Brownian&amp;quot;,xlab=&amp;quot;Time&amp;quot;,ylab=&amp;quot;Path&amp;quot;,type=&amp;quot;l&amp;quot;)
}
StandardBrownian = function() # This is a function to generate Standard Browninan with drift 0 and diffusion 1
{
paths = 10
count = 5000
interval = 5/count
sample = matrix(0,nrow=(count+1),ncol=paths)
for(i in 1:paths)
{
sample[1,i] = 0
for(j in 2:(count+1))
{
sample[j,i] = sample[j-1,i]+((interval)^.5)*rnorm(1)
}
}
cat(&amp;quot;E[W(2)] = &amp;quot;,mean(sample[2001,]),&amp;quot;\n&amp;quot;)
cat(&amp;quot;E[W(5)] = &amp;quot;,mean(sample[5001,]),&amp;quot;\n&amp;quot;)
matplot(sample,main=&amp;quot;Standard Brownian&amp;quot;,xlab=&amp;quot;Time&amp;quot;,ylab=&amp;quot;Path&amp;quot;,type=&amp;quot;l&amp;quot;)
}
StandardBrownian()&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## E[W(2)] = -0.2205643
## E[W(5)] = -0.2023842&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-42-1.png" width="672" />&lt;/p>
&lt;pre class="r">&lt;code>Brownian()&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## E[W(2)] = 5.321014
## E[W(5)] = 5.338038&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_vi/index_files/figure-html/unnamed-chunk-42-2.png" width="672" />&lt;/p>
&lt;/div></description></item><item><title>Understanding what is Statistical Regularity with a Ludo &amp; Paper Game</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_v/</link><pubDate>Tue, 26 Oct 2021 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_v/</guid><description>
&lt;div id="TOC">
&lt;ul>
&lt;li>&lt;a href="#ststistical-regularity" id="toc-ststistical-regularity">Ststistical Regularity&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-ludo-paper-game" id="toc-the-ludo-paper-game">The ‘Ludo &amp;amp; Paper Game’&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#theory" id="toc-theory">Theory&lt;/a>&lt;/li>
&lt;li>&lt;a href="#r-code" id="toc-r-code">R code&lt;/a>&lt;/li>
&lt;/ul>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;p>Hi,…&lt;/p>
&lt;p>In this tutorial we will learn what is &lt;strong>Statistical Regularity ???&lt;/strong>&lt;/p>
&lt;p>Actually, I am writing this blog, because when I first read about this Statistical Regularity from one of my favorite teacher, &lt;a href="https://www.isical.ac.in/~arnabc/">Professor Arnab Chakraborty’s&lt;/a> blog, where he has described the concept of statistical regularity with a beautiful example; I was so excited, but there he did not give the actual solution of that example(i.e., the coding part). So, here I am giving my solution which is actually very easy, and I think that’s why he did not give the coding 😅 😅 😅. But, I was very happy to do that by myself, and that’s why now I like to share that with you. I guess you will enjoy it…&lt;/p>
&lt;p>So, before going to the example part, let me give a brief introduction on what is Statistical Regularity is.&lt;/p>
&lt;div id="ststistical-regularity" class="section level2">
&lt;h2>Ststistical Regularity&lt;/h2>
&lt;p>Statistical regularity is different from mathematical patterns in the sense that it is rarely exactly replicated, it is extremely similar but not the same. We see this all around us. Our finger prints, for example, or the leaves on a tree.&lt;/p>
&lt;p>Statistical regularity is like a mysterious black box which takes random unpredictable input and somehow digests the randomness to produce regular output. No doubt, if we can master this technique then it should help the predictable output from unpredictable inputs! The quite predictable profit of the Casino owner or insurance companies are examples.&lt;/p>
&lt;p>Statistical regularity takes many forms, some more dramatic, some less. The simplest occurrence of the phenomenon was first proved mathematically by Jakob Bernoulli.The theorem and its proof will hardly fill a page completely. But it took 25 years to figure out how to tackle randomness using mathematics to arrive at the proof!&lt;/p>
&lt;p>So, mathematically,&lt;/p>
&lt;p>Consider a random experiment. so as it is known that, the result of a single random experiment can never be correctly predicted before conducting the experiment, if the random experiment is carried out a large number of times under identical conditions it will be seen that the &lt;strong>Relative Frequency (R.F)&lt;/strong> of an event stabilizes to a certain value.&lt;/p>
&lt;p>The Relative Frequency (R.F) of an outcome, &lt;span class="math inline">\(O\)&lt;/span> (say), of an experiment is the number of time &lt;span class="math inline">\(O\)&lt;/span> occurs, &lt;span class="math inline">\(f_n (O)\)&lt;/span>, divided by the total number of times, &lt;span class="math inline">\(n\)&lt;/span>, the experiment is carried out.&lt;/p>
&lt;p>So, the Relative Frequency (R.F) of an outcome &lt;span class="math inline">\(O\)&lt;/span> is:&lt;span class="math display">\[r_n (O)=\frac{f_n (O)}{n}\:\:;clearly\:0\leq r_n (O)\leq 1\]&lt;/span>. It is seen that when the experiment is repeated indefinitely, &lt;span class="math inline">\(r_n (O)\)&lt;/span> tends to a certain value, &lt;span class="math inline">\(p\)&lt;/span> (say); where &lt;span class="math inline">\(0 \leq p \leq 1\)&lt;/span>.&lt;/p>
&lt;p>for example :&lt;/p>
&lt;p>A coin was tossed several times and the no. of times it fell Heads was noted. The following table shows the no. of Heads (H) obtained in sets of &lt;span class="math inline">\(n\)&lt;/span> experiments.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr class="header">
&lt;th align="left">Set&lt;/th>
&lt;th align="right">n=10&lt;/th>
&lt;th align="right">n=50&lt;/th>
&lt;th align="left">n=100&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td align="left">1&lt;/td>
&lt;td align="right">4&lt;/td>
&lt;td align="right">29&lt;/td>
&lt;td align="left">47&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">2&lt;/td>
&lt;td align="right">4&lt;/td>
&lt;td align="right">22&lt;/td>
&lt;td align="left">52&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">3&lt;/td>
&lt;td align="right">6&lt;/td>
&lt;td align="right">24&lt;/td>
&lt;td align="left">54&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">4&lt;/td>
&lt;td align="right">7&lt;/td>
&lt;td align="right">27&lt;/td>
&lt;td align="left">49&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">5&lt;/td>
&lt;td align="right">5&lt;/td>
&lt;td align="right">31&lt;/td>
&lt;td align="left">53&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">6&lt;/td>
&lt;td align="right">5&lt;/td>
&lt;td align="right">26&lt;/td>
&lt;td align="left">51&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">7&lt;/td>
&lt;td align="right">3&lt;/td>
&lt;td align="right">25&lt;/td>
&lt;td align="left">48&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">8&lt;/td>
&lt;td align="right">7&lt;/td>
&lt;td align="right">28&lt;/td>
&lt;td align="left">52&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">9&lt;/td>
&lt;td align="right">5&lt;/td>
&lt;td align="right">21&lt;/td>
&lt;td align="left">47&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td align="left">10&lt;/td>
&lt;td align="right">6&lt;/td>
&lt;td align="right">23&lt;/td>
&lt;td align="left">55&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td align="left">Total&lt;/td>
&lt;td align="right">52&lt;/td>
&lt;td align="right">256&lt;/td>
&lt;td align="left">508&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>&lt;p>For N=10, Relative Frequency (R.F), r(H) varies from 0.3 to 0.7.&lt;/p>&lt;/li>
&lt;li>&lt;p>For n=50, extreme values of r(H) become closer being 0.42 &amp;amp; 0.62.&lt;/p>&lt;/li>
&lt;li>&lt;p>For N=100, r(H) varies between 0.47 &amp;amp; 0.55.&lt;/p>&lt;/li>
&lt;/ul>
&lt;p>The average values of &lt;span class="math inline">\(r(H)\)&lt;/span> were 0.520, 0.512, 0.508 for &lt;span class="math inline">\(n\)&lt;/span>= 10, 50 ,100, respectively. Thus one may conclude that as &lt;span class="math inline">\(n\)&lt;/span> increases Relative Frequency of H will be expected to be very close to 0.50.&lt;/p>
&lt;p>Ok, so we have understand what is Statistical Regularity. Now, It’s time to jump into our main example.&lt;/p>
&lt;/div>
&lt;div id="the-ludo-paper-game" class="section level2">
&lt;h2>The ‘Ludo &amp;amp; Paper Game’&lt;/h2>
&lt;div id="theory" class="section level3">
&lt;h3>Theory&lt;/h3>
&lt;ul>
&lt;li>We take four pieces of paper and write the following formulas on them:&lt;/li>
&lt;/ul>
&lt;p>1 &lt;span class="math display">\[X_{(new)}=0.8*X_{(old)}+0.1\]&lt;/span>
&lt;span class="math display">\[Y_{(new)}=0.8*Y_{(old)}+0.04\]&lt;/span>
2 &lt;span class="math display">\[X_{(new)}=0.5*X_{(old)}+0.25\]&lt;/span>
&lt;span class="math display">\[Y_{(new)}=0.5*Y_{(old)}+0.04\]&lt;/span>
3 &lt;span class="math display">\[X_{(new)}=0.355*X_{(old)}-0.355*Y_{(old)}+0.266\]&lt;/span>
&lt;span class="math display">\[Y_{(new)}=0.355*X_{(old)}+0.355*Y_{(old)}+0.078\]&lt;/span>
4 &lt;span class="math display">\[X_{(new)}=0.355*X_{(old)}+0.355*Y_{(old)}+0.378\]&lt;/span>
&lt;span class="math display">\[Y_{(new)}=-0.355*X_{(old)}+0.355*Y_{(old)}+0.434\]&lt;/span>&lt;/p>
&lt;ul>
&lt;li>&lt;p>These are all formulas to compute two numbers, &lt;span class="math inline">\(X_{(new)}\)&lt;/span> and &lt;span class="math inline">\({Y_(new)}\)&lt;/span> from two other numbers &lt;span class="math inline">\(X_{(old)}\)&lt;/span> and &lt;span class="math inline">\(Y_{(old)}\)&lt;/span>.&lt;/p>&lt;/li>
&lt;li>&lt;p>We shall play a game of Ludo with these! The Ludo board will be &lt;span class="math inline">\(\mathbb{R}^2\)&lt;/span>, and the counter will be a single point, which is initially at &lt;span class="math inline">\((X,Y)=(0,0)\)&lt;/span>. Draw one of the four pieces of paper at random and apply the formula on it to compute the new position of the counter. Keep on doing this. A every step you are drawing one of the four papers at random (same paper may get picked many times). All the counter positions are marked as dots.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="r-code" class="section level3">
&lt;h3>R code&lt;/h3>
&lt;pre class="r">&lt;code>play=function(n)
{
X.old=0
Y.old=0
X.all=NULL
Y.all=NULL
for(i in 1:n)
{
sam=sample(1:4,1,replace=T)
if(sam==1)
{
X.new=0.8*X.old+0.1
Y.new=0.8*Y.old+0.04
}
else if(sam==2)
{
X.new=0.5*X.old+0.25
Y.new=0.5*Y.old+0.4
}
else if(sam==3)
{
X.new=0.355*X.old-0.355*Y.old+0.266
Y.new=0.355*X.old+0.355*Y.old+0.078
}
else
{
X.new=0.355*X.old+0.355*Y.old+0.378
Y.new=-0.355*X.old+0.355*Y.old+0.434
}
X.all[i]=X.new
Y.all[i]=Y.new
X.old=X.new
Y.old=Y.new
}
plot(X.all,Y.all,
pch=16,
col=&amp;quot;darkgreen&amp;quot;,
cex=.7,
axes=F,
xlab=&amp;quot;&amp;quot;,
ylab=&amp;quot;&amp;quot;,
main=&amp;quot;Ludo &amp;amp; Paper Game Population&amp;quot;)
box()
}
Ans=play(100000) #--- Playing this game 10000 times&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_v/index_files/figure-html/unnamed-chunk-1-1.png" width="672" />&lt;/p>
&lt;p>So, actually, individual outcomes are random, but, when the number of trials are very large, then the experiment looses it’s randomness and gives a known structural shape which is very interesting.&lt;/p>
&lt;p>Thank you for reading…&lt;/p>
&lt;/div>
&lt;/div></description></item><item><title>Write a user-defined function in R</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_iii/</link><pubDate>Mon, 25 Oct 2021 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_iii/</guid><description>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_iii/index_files/header-attrs/header-attrs.js">&lt;/script>
&lt;div id="TOC">
&lt;ul>
&lt;li>&lt;a href="#introduction">INTRODUCTION&lt;/a>&lt;/li>
&lt;li>&lt;a href="#user-defined-functions">User Defined Functions&lt;/a>&lt;/li>
&lt;li>&lt;a href="#doing-more-than-one-computation">Doing more than one computation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#default-argument-of-a-function">Default argument of a function&lt;/a>&lt;/li>
&lt;li>&lt;a href="#additional-arguments">Additional Arguments&lt;/a>&lt;/li>
&lt;li>&lt;a href="#data-types-of-arguments">Data types of arguments&lt;/a>&lt;/li>
&lt;li>&lt;a href="#sanity-checking-argument">Sanity checking argument&lt;/a>&lt;/li>
&lt;li>&lt;a href="#scope-of-variables">Scope of variables&lt;/a>&lt;/li>
&lt;li>&lt;a href="#recursive-function">Recursive Function&lt;/a>&lt;/li>
&lt;li>&lt;a href="#loops-in-r">Loops in R&lt;/a>&lt;/li>
&lt;li>&lt;a href="#while-loop">While loop&lt;/a>&lt;/li>
&lt;li>&lt;a href="#if-if-else">If &amp;amp; If-Else&lt;/a>&lt;/li>
&lt;li>&lt;a href="#if-else-function">If-Else function&lt;/a>&lt;/li>
&lt;li>&lt;a href="#else-if-ladder">Else if Ladder&lt;/a>&lt;/li>
&lt;li>&lt;a href="#switch-statement">Switch Statement&lt;/a>&lt;/li>
&lt;li>&lt;a href="#repeat-loop">Repeat Loop&lt;/a>&lt;/li>
&lt;li>&lt;a href="#plotting-functions">Plotting Functions&lt;/a>&lt;/li>
&lt;li>&lt;a href="#plotting-normal-curve">Plotting normal curve&lt;/a>&lt;/li>
&lt;li>&lt;a href="#sin1x-plot">sin(1/x) plot&lt;/a>&lt;/li>
&lt;li>&lt;a href="#zoom-at-the-origin">Zoom at the origin&lt;/a>&lt;/li>
&lt;li>&lt;a href="#solving-equation">Solving Equation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#solving-equation-1">Solving Equation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#solving-equation-2">Solving Equation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#some-calculus-in-r">Some Calculus in R&lt;/a>&lt;/li>
&lt;li>&lt;a href="#optimization">Optimization&lt;/a>&lt;/li>
&lt;li>&lt;a href="#further-reading">Further reading&lt;/a>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="introduction" class="section level2">
&lt;h2>INTRODUCTION&lt;/h2>
&lt;p>In this tutorial, we will learn, how to make our own custom function in R.Though, R has thousands of functions under thousands of packages, but it is most important to know about how to make a customized function function.&lt;/p>
&lt;/div>
&lt;div id="user-defined-functions" class="section level2">
&lt;h2>User Defined Functions&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Functions are created using the &lt;strong>&lt;em>function()&lt;/em>&lt;/strong> directive and are
stored as R objects just like anything else. In particular, they are R
objects of class function”.&lt;/p>&lt;/li>
&lt;li>&lt;p>The basic format of the code is&lt;/p>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>function_name = function(arguments)&lt;/strong>&lt;br />
&lt;strong>{&lt;/strong>
&lt;strong>main computation to be done&lt;/strong>
&lt;strong>}&lt;/strong>&lt;/p>
&lt;pre class="r">&lt;code>#---define a function
testfunction = function(x,y)
{
x+y
}
#--- call the function with arguments 2,5
testfunction(2,5)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 7&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="doing-more-than-one-computation" class="section level2">
&lt;h2>Doing more than one computation&lt;/h2>
&lt;ul>
&lt;li>When a function performs more than one task and gives multiple objects &lt;strong>&lt;em>return()&lt;/em>&lt;/strong> is used to get all the outputs in a form of a vector.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>testfunction = function(x,y)
{
sum= x+y
prod= x*y
return(c(Sum=sum,Product=prod))
}
testfunction(2,5)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Sum Product
## 7 10&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>Note that the two output can be accepted separatedly as&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>result = testfunction(2,5)
result[1]&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Sum
## 7&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>result[2]&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Product
## 10&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>Alternatively multiple output can be extracted using &lt;strong>&lt;em>list()&lt;/em>&lt;/strong>. This will enable us to extract by names (along with indices)&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>testfunction = function(x,y)
{
sum= x+y
prod= x*y
output=list(Sum=sum,Product=prod)
return(output)
}
output= testfunction(2,5)&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>output$Sum&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 7&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>output$Product&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 10&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="default-argument-of-a-function" class="section level2">
&lt;h2>Default argument of a function&lt;/h2>
&lt;ul>
&lt;li>&lt;p>R provides methods to define the default value of the arguments while defining the function.&lt;/p>&lt;/li>
&lt;li>&lt;p>This default values will be used when the function is called unless this argument values are changed during calling.&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>#--- initializing x=1 &amp;amp; y=1
testfunction = function(x=1,y=1)
{
sum= x+y
prod= x*y
#--- Creates the output list
output=list(Sum=sum,Product=prod)
return(output)
}
testfunction() #-- calling function with no arguments&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## $Sum
## [1] 2
##
## $Product
## [1] 1&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="additional-arguments" class="section level2">
&lt;h2>Additional Arguments&lt;/h2>
&lt;ul>
&lt;li>Provision for additional arguments (&lt;em>probably optional arguments, which cannot be decided beforehand&lt;/em>) can be done using “&lt;strong>…&lt;/strong>”&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>testfunction = function(x=1,y=1,...)
{
sum= x+y
prod= x*y
#--- Creates the output list
output=list(Sum=sum,Product=prod)
return(output)
}
testfunction(2,5,z=12) #-- z is an extra argument which has no use in this function&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## $Sum
## [1] 7
##
## $Product
## [1] 10&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="data-types-of-arguments" class="section level2">
&lt;h2>Data types of arguments&lt;/h2>
&lt;ul>
&lt;li>Since the types of arguments are not specified (&lt;em>at the time of definition&lt;/em>), the arguments can be of any type of any data type provided the &lt;strong>internal code of the function is conformable with that data type&lt;/strong>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>testfunction = function(x=1,y=1,...)
{
sum= x+y
prod= x*y
#--- Creates the output list
output=list(Sum=sum,Product=prod)
return(output)
}
testfunction(2,5,z=12) #-- calling with vectors&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## $Sum
## [1] 7
##
## $Product
## [1] 10&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>#-- calling with characters
testfunction(&amp;quot;F&amp;quot;,&amp;quot;M&amp;quot;)&lt;/code>&lt;/pre>
&lt;p>&lt;strong>&lt;span class="math inline">\(\color{red}{\text{Error in x+y : non-numeric argument to binary operator}}\)&lt;/span>&lt;/strong>&lt;/p>
&lt;/div>
&lt;div id="sanity-checking-argument" class="section level2">
&lt;h2>Sanity checking argument&lt;/h2>
&lt;ul>
&lt;li>&lt;p>So how can we stop a function if the user calls it with non-conformable arguments ?&lt;/p>&lt;/li>
&lt;li>&lt;p>A good practice is to write functions in such that while calling, it checks whether the arguments supplied make sense before going to the main body of the function.&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>testfunction = function(x=1,y=1,...)
{
#-- check if the arguments are not characters
stopifnot(typeof(x)!=&amp;quot;character&amp;quot;,typeof(y)!=&amp;quot;character&amp;quot;)
sum= x+y
prod= x*y
#--- Creates the output list
output=list(Sum=sum,Product=prod)
return(output)
}
testfunction(&amp;quot;F&amp;quot;,&amp;quot;M&amp;quot;)&lt;/code>&lt;/pre>
&lt;p>&lt;strong>&lt;span class="math inline">\(\color{red}{\text{Error in testfunction(&amp;quot;F&amp;quot;,&amp;quot;M&amp;quot;) : typeof(x) != &amp;quot;character&amp;quot; is not TRUE}}\)&lt;/span>&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>The &lt;strong>stopifnot&lt;/strong> function halts the execution of the function (&lt;em>with error message&lt;/em>) if all of its arguments do not evaluate to &lt;strong>TRUE&lt;/strong>.&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="scope-of-variables" class="section level2">
&lt;h2>Scope of variables&lt;/h2>
&lt;ul>
&lt;li>When we define a &lt;strong>variable within a function&lt;/strong>, it will be local and will not affect any &lt;strong>global variable&lt;/strong> even if the name matches.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>f_outer=function()
{
a=2
f_inner=function()
{
b=5
}
}
c=10&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>Then variable &lt;strong>c&lt;/strong> is global to both &lt;strong>f_outer&lt;/strong> and &lt;strong>f_inner&lt;/strong>. For &lt;strong>f_inner&lt;/strong> variable &lt;strong>b&lt;/strong> is local but &lt;strong>a&lt;/strong> is global whereas for &lt;strong>f_outer&lt;/strong>, both &lt;strong>a&lt;/strong> and &lt;strong>b&lt;/strong> are local.&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="recursive-function" class="section level2">
&lt;h2>Recursive Function&lt;/h2>
&lt;ul>
&lt;li>R supports recursive function, i.e., a function that calls itself recursively.&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>#-- Creating a recursive function
fact= function(x)
{
if(x==0)
{
return(1)
}
else
{
return(x+fact(x-1))
}
}
fact(5) #-- calling the function with x=5&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 16&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="loops-in-r" class="section level2">
&lt;h2>Loops in R&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Loops helps to repeat a job. We first start with for loop.&lt;/p>&lt;/li>
&lt;li>&lt;p>The syntax is
&lt;strong>for(variable in sequence)&lt;/strong>
&lt;strong>{&lt;/strong>
&lt;strong>expression to be evaluated&lt;/strong>
&lt;strong>}&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>Here sequence is an expression which evaluates to a vector(not necessarily in A.P.)&lt;/p>&lt;/li>
&lt;li>&lt;p>For example all the following are valid
&lt;strong>for(i in 1:10)&lt;/strong>
&lt;strong>for(i in c(2,3,7,9,13,17,19,23))&lt;/strong>
&lt;strong>for(i in c(“A”,“B”,“C”))&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>The no. of times the expression in loop is evaluated is the length of the sequence.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="while-loop" class="section level2">
&lt;h2>While loop&lt;/h2>
&lt;ul>
&lt;li>&lt;p>The syntax is
&lt;strong>while(condition)&lt;/strong>
&lt;strong>{&lt;/strong>
&lt;strong>expression to be evaluated&lt;/strong>
&lt;strong>}&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>The loop repeats its action until the test condition is not satisfied.&lt;/p>&lt;/li>
&lt;li>&lt;p>Unlike for loop we need not to know in advance how many times the loop will repeat.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="if-if-else" class="section level2">
&lt;h2>If &amp;amp; If-Else&lt;/h2>
&lt;ul>
&lt;li>&lt;p>The syntax for if statement is
if(condition)
{
expression
}&lt;/p>&lt;/li>
&lt;li>&lt;p>For a binary situation we can use if-else
if(condition)
{
expression 1
}
else
{
expression 2
}&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="if-else-function" class="section level2">
&lt;h2>If-Else function&lt;/h2>
&lt;ul>
&lt;li>&lt;p>An alternative better way if-else statements is &lt;strong>ifelse()&lt;/strong> function.&lt;/p>&lt;/li>
&lt;li>&lt;p>The syntax is
&lt;strong>new variable= ifelse(Some Condition, Value of new variable if condition is true, value if condition is false)&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>e.g. 
&lt;strong>category= ifelse(marks&amp;gt;80, “Good”,“Fair”)&lt;/strong>
assigns value Good if marks is more than 80 and otherwise Fair.&lt;/p>&lt;/li>
&lt;li>&lt;p>The additional advantage is in the condition this function can compare a vector with scalar (&lt;em>interpreted as each element compared to the scalar&lt;/em>)&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="else-if-ladder" class="section level2">
&lt;h2>Else if Ladder&lt;/h2>
&lt;ul>
&lt;li>When we have more than two cases we can use else-if ladder&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>f= function(x)
{
if(x==1) print(a)
else if(x==2) print(b)
else print(c)
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="switch-statement" class="section level2">
&lt;h2>Switch Statement&lt;/h2>
&lt;ul>
&lt;li>&lt;p>An alternative and faster way is &lt;strong>switch()&lt;/strong> statement.&lt;/p>&lt;/li>
&lt;li>&lt;p>The basic syntax is &lt;strong>switch(statement,list)&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>Here &lt;strong>statement&lt;/strong> is evaluated and based on this value, the corresponding item in the &lt;strong>list&lt;/strong> is returned.&lt;/p>&lt;/li>
&lt;li>&lt;p>e.g. &lt;strong>switch(2,“A”,“B”,“C”)&lt;/strong> gives the answer “B”. It selects the item no. 2 from the list.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>switch(4,“A”,“B”,“C”)&lt;/strong> gives NULL as there is no item with index 4 in the list.&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>switch(“color”,“color”=“red”,“shape”=“round”,“length”=5)&lt;/strong> gives answer red (&lt;em>it matches the string&lt;/em>)&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>stat= function(x,type)
{
switch(type,&amp;quot;mean&amp;quot;=mean(x),
&amp;quot;median&amp;quot;=median(x),
&amp;quot;sd&amp;quot;=sd(x))
} #--- function ends here
stat(1:10,&amp;quot;mean&amp;quot;) #-- call the function with mean&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 5.5&lt;/code>&lt;/pre>
&lt;pre class="r">&lt;code>stat(1:10,&amp;quot;median&amp;quot;) #-- call the function with median&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 5.5&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="repeat-loop" class="section level2">
&lt;h2>Repeat Loop&lt;/h2>
&lt;ul>
&lt;li>Basic syntax is&lt;/li>
&lt;/ul>
&lt;p>repeat
{
expression to be evaluated
}&lt;/p>
&lt;ul>
&lt;li>&lt;p>No default way of termination.&lt;/p>&lt;/li>
&lt;li>&lt;p>We need to manually terminate the loop using &lt;strong>break&lt;/strong> statement.&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code>x=1 #-- Take any value x as 1
repeat
{ #-- Loop begin here
x=x+1
if(x==6) break #-- manual instruction to exit loop
} #-- Loop ends here
x #-- checking the value of x&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## [1] 6&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="plotting-functions" class="section level2">
&lt;h2>Plotting Functions&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Any function can be plotted using &lt;strong>curve()&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>The syntax is
&lt;strong>curve(function,from,to,n,add=T/F,…)&lt;/strong>
where from and to are range over which the function is plotted and &lt;strong>n&lt;/strong>(&lt;em>integer&lt;/em>) is the number of points at which we evaluate. &lt;strong>add=TRUE/FALSE&lt;/strong> indicates whether to add this curve to a existing plot or not.&lt;/p>&lt;/li>
&lt;li>&lt;p>To get more information about it’s arguments type &lt;strong>??curve() &lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> myfun= function(x)
{
x*(1-x)
}
curve(myfun,from=0,to=1)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_iii/index_files/figure-html/unnamed-chunk-20-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="plotting-normal-curve" class="section level2">
&lt;h2>Plotting normal curve&lt;/h2>
&lt;pre class="r">&lt;code> #-- dnorm gives pdf of N(0,1)
curve(dnorm,from = -4,to=4,n=500)&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_iii/index_files/figure-html/unnamed-chunk-22-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="sin1x-plot" class="section level2">
&lt;h2>sin(1/x) plot&lt;/h2>
&lt;pre class="r">&lt;code> curve(sin(1/x),from = -2,to = 2)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Warning in sin(1/x): NaNs produced&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_iii/index_files/figure-html/unnamed-chunk-23-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="zoom-at-the-origin" class="section level2">
&lt;h2>Zoom at the origin&lt;/h2>
&lt;pre class="r">&lt;code> curve(sin(1/x),from = -0.1,to = 0.1)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## Warning in sin(1/x): NaNs produced&lt;/code>&lt;/pre>
&lt;p>&lt;img src="https://rajeshmajumderblog.netlify.app/blog/internal-project_iii/index_files/figure-html/unnamed-chunk-24-1.png" width="672" />&lt;/p>
&lt;/div>
&lt;div id="solving-equation" class="section level2">
&lt;h2>Solving Equation&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Already we know if we have a system of equations we can use &lt;strong>&lt;em>solve()&lt;/em>&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For equations involving one variable we can use &lt;strong>&lt;em>uniroot()&lt;/em>&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>The syntax is &lt;strong>&lt;em>uniroot(function,interval,…)&lt;/em>&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For solve &lt;span class="math display">\[e^x=sin(x)\]&lt;/span> we write&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> uniroot(function(x) exp(x)-sin(x),c(-5,5))&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="solving-equation-1" class="section level2">
&lt;h2>Solving Equation&lt;/h2>
&lt;pre>&lt;code>## $root
## [1] -3.183063
##
## $f.root
## [1] -1.359327e-08
##
## $iter
## [1] 8
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 6.103516e-05&lt;/code>&lt;/pre>
&lt;/div>
&lt;div id="solving-equation-2" class="section level2">
&lt;h2>Solving Equation&lt;/h2>
&lt;ul>
&lt;li>&lt;p>For finding real or complex roots of a ploynomial use &lt;strong>&lt;em>polyroot()&lt;/em>&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>For solving roots of &lt;span class="math inline">\(n\)&lt;/span> non-linear equations we can use &lt;strong>&lt;em>multiroot()&lt;/em>&lt;/strong> from the &lt;strong>&lt;em>rootSolve&lt;/em>&lt;/strong> package.&lt;/p>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="some-calculus-in-r" class="section level2">
&lt;h2>Some Calculus in R&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Define integral can be done using &lt;strong>&lt;em>integrate()&lt;/em>&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>e.g. &lt;span class="math inline">\(\int_0^1(x^2)dx\)&lt;/span> can be done using&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> integrate(function(x) x^2,0,1)&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## 0.3333333 with absolute error &amp;lt; 3.7e-15&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>For derivatives, we use &lt;strong>&lt;em>deriv()&lt;/em>&lt;/strong>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="optimization" class="section level2">
&lt;h2>Optimization&lt;/h2>
&lt;ul>
&lt;li>&lt;p>Maximum or Minimum value of a function can be found using &lt;strong>&lt;em>optimize()&lt;/em>&lt;/strong>&lt;/p>&lt;/li>
&lt;li>&lt;p>&lt;strong>&lt;em>optimize(function,interval,maximum=TRUE/FALSE)&lt;/em>&lt;/strong>&lt;/p>&lt;/li>
&lt;/ul>
&lt;pre class="r">&lt;code> optimise(function(x) exp(-x),c(0,5))&lt;/code>&lt;/pre>
&lt;pre>&lt;code>## $minimum
## [1] 4.999936
##
## $objective
## [1] 0.006738379&lt;/code>&lt;/pre>
&lt;ul>
&lt;li>There are other functions for optimization like &lt;strong>&lt;em>optim()&lt;/em>&lt;/strong>,&lt;strong>&lt;em>nlm()&lt;/em>&lt;/strong>,&lt;strong>&lt;em>constrOptim()&lt;/em>&lt;/strong>.&lt;/li>
&lt;/ul>
&lt;/div>
&lt;div id="further-reading" class="section level2">
&lt;h2>Further reading&lt;/h2>
&lt;p>&lt;a href="https://rstudio-education.github.io/hopr/">&lt;em>Garrett Grolemund&lt;/em>, &lt;strong>Hands-On Programming with R&lt;/strong>, &lt;em>O’REILLY&lt;/em>&lt;/a>&lt;/p>
&lt;/div></description></item><item><title>Performance of LASSO when one or more covariate(s) is/are Missing Not at Random(MNAR)</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project/</link><pubDate>Mon, 23 Aug 2021 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project/</guid><description>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project/index_files/header-attrs/header-attrs.js">&lt;/script>
&lt;div id="about" class="section level2">
&lt;h2>About&lt;/h2>
&lt;p>This is my M.Sc. final year project.&lt;/p>
&lt;p>I did this project under the supervision of my mentor &lt;a href="https://www.wbsu.ac.in/faculty/dr-sumanta-adhya/">Dr. Sumanta Adhya, WBSU.&lt;/a>&lt;/p>
&lt;p>In this project, I have tried to see that, how LASSO will perform the variable selection tasks under the multicollinearity situation when the data is affected by the missing values where the missingness is not at random. I have investigated different LASSO solutions from simulated data sets and trying to find a method that will benefit us in this situation. In this project, I have proposed a new methodology, “Inverse Probability Weighted Logistic Lasso Estimation” which gives a better solution than complete case analysis under the MNAR mechanism.&lt;/p>
&lt;p>here I have compared a total of five Lasso solution techniques, that is, “LASSO on Original Data set(when all known)”, “LASSO on Complete Data set(removing all missing observations)”, “IPW-LASSO on Complete Data set using known(actual) missing probabilities”, “IPW-LASSO on Complete Data set using estimated(MLE) missing probabilities”, and “IPW-LASSO on Complete Data set using estimated(Logistic LASSO) missing probabilities”. And, have shown that “IPW-LASSO on Complete Data set using estimated(Logistic LASSO) missing probabilities” is the better solution than, simple complete case analysis; when the missing mechanism is MNAR.&lt;/p>
&lt;p>&lt;strong>&lt;em>Keywords&lt;/em>&lt;/strong> : &lt;em>MNAR&lt;/em>, &lt;em>Logistic Regression&lt;/em>, &lt;em>LASSO&lt;/em>, &lt;em>IPW&lt;/em>, &lt;em>IPW-LASSO&lt;/em>.&lt;/p>
&lt;p>&lt;div class="alert alert-note">
&lt;div>
Click the &lt;em>Slide&lt;/em> button above to see the project presentation.
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>&lt;div class="alert alert-note">
&lt;div>
Click the &lt;em>Report&lt;/em> button above to see the project document.
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>&lt;div class="alert alert-note">
&lt;div>
Click the &lt;em>github&lt;/em> button above to see the R code.
&lt;/div>
&lt;/div>
&lt;/p>
&lt;/div></description></item><item><title>A Study of effect of different Diet on Weight loss</title><link>https://rajeshmajumderblog.netlify.app/blog/internal-project_ii/</link><pubDate>Mon, 14 Jan 2019 00:00:00 +0000</pubDate><guid>https://rajeshmajumderblog.netlify.app/blog/internal-project_ii/</guid><description>
&lt;script src="https://rajeshmajumderblog.netlify.app/blog/internal-project_ii/index_files/header-attrs/header-attrs.js">&lt;/script>
&lt;div id="about" class="section level2">
&lt;h2>About&lt;/h2>
&lt;p>This is my B.Sc. final year project.&lt;/p>
&lt;p>I did this project under the supervision of my mentor &lt;strong>Dr. Arabinda Das, A.P.C. College.&lt;/strong>&lt;/p>
&lt;p>In this project, I had worked on diet data, where I had studied that, how different diets actually affected weight loss.&lt;/p>
&lt;p>&lt;strong>&lt;em>Keywords&lt;/em>&lt;/strong>: &lt;em>ANOVA&lt;/em>, &lt;em>ANCOVA&lt;/em>, &lt;em>Shapiro-Wilk Test&lt;/em>, &lt;em>Kolmogorov-Smirnov test&lt;/em>, &lt;em>Bartlett’s test&lt;/em>, &lt;em>Levene’s Test&lt;/em>, &lt;em>Tukey HSD test&lt;/em>.&lt;/p>
&lt;p>&lt;div class="alert alert-note">
&lt;div>
Click the &lt;em>Report&lt;/em> button above to see the project document.
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>&lt;div class="alert alert-note">
&lt;div>
Unfortunately, I am unable to give the data set and the code because, I have lost all the necessary documents regarding this project, due to a computer crash. But I am trying to do this again from scratch.
&lt;/div>
&lt;/div>
&lt;/p>
&lt;/div></description></item></channel></rss>