Add Week 2

parent c353f9a0
...@@ -10,7 +10,7 @@ There are a lot of papers to read, and for good reason. Machine learning is adva ...@@ -10,7 +10,7 @@ There are a lot of papers to read, and for good reason. Machine learning is adva
What do you need for this? What do you need for this?
* A functioning computer that can run SSH and read PDFs * A functioning computer that can run SSH, run R and Python, and read PDFs
* A basic understanding of statistics * A basic understanding of statistics
* A conceptual understanding of calculus (especially gradients) * A conceptual understanding of calculus (especially gradients)
...@@ -22,6 +22,7 @@ I think it's better to learn as you go. If you spend too much time on preliminar ...@@ -22,6 +22,7 @@ I think it's better to learn as you go. If you spend too much time on preliminar
* Read through the paper once without taking notes. Then write a 1-2 sentences summary. This readthrough is just to get a high-level understanding of the paper. * Read through the paper once without taking notes. Then write a 1-2 sentences summary. This readthrough is just to get a high-level understanding of the paper.
* Read through the paper again and take notes. Write a more thorough 2-3 paragraph summary. Make sure to contextualize this summary. What makes this paper interesting? * Read through the paper again and take notes. Write a more thorough 2-3 paragraph summary. Make sure to contextualize this summary. What makes this paper interesting?
* If you're stuck for more than 30-45 minutes. Ask lots of questions in `#ml-study-group`. * If you're stuck for more than 30-45 minutes. Ask lots of questions in `#ml-study-group`.
* Where possible, I have included links directly to the PDFs (possibly using the GMU Mutex). This means my links may be different than the links in the Credits section. If this causes issues for you, please message me (`@ksarkhel`) on `#ml-study-group`.
## Credits ## Credits
...@@ -40,13 +41,58 @@ Almost all of the material has come from the following sources: ...@@ -40,13 +41,58 @@ Almost all of the material has come from the following sources:
Week 1 is an easy week. This should give you some background to get started. Week 1 is an easy week. This should give you some background to get started.
#### Data Science
These papers provide a background as to what data science looks like in actual organizations.
* [Data scientists mostly just do arithmetic and that’s a good thing](https://m.signalvnoise.com/data-scientists-mostly-just-do-arithmetic-and-that-s-a-good-thing-c6371885f7f6); Noah Lorang (2016). * [Data scientists mostly just do arithmetic and that’s a good thing](https://m.signalvnoise.com/data-scientists-mostly-just-do-arithmetic-and-that-s-a-good-thing-c6371885f7f6); Noah Lorang (2016).
* [Enterprise Data Analysis and Visualization: An Interview Study](https://idl.cs.washington.edu/files/2012-EnterpriseAnalysisInterviews-VAST.pdf); Sean Kandel, Andreas Paepcke, Joseph Hellerstein, Jeffrey Heer (2012). * [Enterprise Data Analysis and Visualization: An Interview Study](https://idl.cs.washington.edu/files/2012-EnterpriseAnalysisInterviews-VAST.pdf); Sean Kandel, Andreas Paepcke, Joseph Hellerstein, Jeffrey Heer (2012).
* [50 years of data science](https://www-tandfonline-com.mutex.gmu.edu/doi/abs/10.1080/10618600.2017.1384734); David Donoho (2017). * [50 years of data science](https://www-tandfonline-com.mutex.gmu.edu/doi/abs/10.1080/10618600.2017.1384734); David Donoho (2017).
#### Deep Learning
The following give a survey (high-level overview of developments in the field) of deep learning. They associated video also introduces the concept of using deep learning for classifying images.
* Three Giants' Survey: [Deep learning](http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf); LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). * Three Giants' Survey: [Deep learning](http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf); LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015).
* Deep Learning 2018 [Lesson 1: Recognizing Cats and Dogs](https://course.fast.ai/lessons/lesson1.html) * Deep Learning 2018 [Lesson 1: Recognizing Cats and Dogs](https://course.fast.ai/lessons/lesson1.html)
#### Statistical Learning
This chapter in ISL introduces R and the concept of statistical learning and some _very important concepts_ for assessing model accuracy.
* Introduction to Statistical Learning: Chapter 2 [[pdf]](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf) * Introduction to Statistical Learning: Chapter 2 [[pdf]](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf)
### Week 2
This is where we start having some fun.
#### Data Science
These papers discuss two key topics: how data is collected and how to work on a team with other data scientists.
* [Tidy data](https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf); Hadley Wickham (2013).
* [Data organization in spreadsheets](https://www-tandfonline-com.mutex.gmu.edu/doi/full/10.1080/00031305.2017.1375989); Karl W Broman, Kara Woo (2017).
* [Best practices for using google sheets in your data project](https://matthewlincoln.net/2018/03/26/best-practices-for-using-google-sheets-in-your-data-project.html); Matthew Lincoln (2018).
* [Modeling as a core component of structuring data](https://iase-web.org/documents/SERJ/SERJ16(2)_Konold.pdf); Clifford Konold, William Finzer, Kozoom Kreetong (2017)
#### Deep Learning
The following paper follow the "eve" of deep learning when deep learning showed promise and began to take off. The Fast.ai video introduces the Convolutional Neural Network (CNN) which was a milestone in deep learning.
* Deep Learning 2018 [Lesson 2: Convolutional Neural Networks](https://course.fast.ai/lessons/lesson2.html)
* [A fast learning algorithm for deep belief nets](http://www.cs.toronto.edu/~hinton/absps/ncfast.pdf); Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh (2006).
* [Reducing the dimensionality of data with neural networks](http://www.cs.toronto.edu/~hinton/science.pdf); Hinton, Geoffrey E., and Ruslan R. Salakhutdinov (2006).
#### Statistical Learning
This chapter of ISL introduces regressions, a tool for finding relationships between two or more numerical data.
* Introduction to Statistical Learning: Chapter 3 [[pdf]](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf)
### Future Weeks
I will publish weeks as we work through the content. If we find that there is interest in specific topics, I'll add that to the weeks as I push their content.
License License
-- --
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment