Send Close Add comments: (status displays here)
Got it!  This site uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.nbsp; Note: This appears on each machine/browser from which this site is accessed.
Spring 2020 elective: CS 496 : Special topics (Introduction to data science)


1. Spring 2020 elective: CS 496 : Special topics (Introduction to data science)

2. Tentative Spring 2020 elective: CS 496 : Special topics (Introduction to data science)
Description: An introduction to data science from a computer science perspective. Includes an introduction to the programming language used in the course, numerical computation, essential probability and Bayesian statistics concepts, structured and unstructured text and data processing, data and information visualization, machine learning using examples such as Naive Bayes, regression, decision trees, clustering, mixture models and topic modeling.

Prerequisites: Completed at least two 300 level courses in computer science.

Textbook: Python Data Science Handbook: Essential tools for working with data. Jake VanderPlas. O'Reilly. ISBN-13: 978-1491912058.

Programming language: The Python programming language is a popular language in general and especially for data science since there are many libraries and support tools for Python, including interfaces to other non-Python libraries. Python 3.x will be used and some distinctive features of moving from Python 2.x to Python 3.x will be covered as support for Python 2.x is to be discontinued in early 2020.
Data science topics: The field of data science brings together many different areas such as computer science, mathematics, statistics, machine learning, data/information visualization, information technology and business. The general outline of the course topics is as follows. Professor: The professor, Dr. Snyder (PhD, computer science, applied programming language theory) has spent ten recent years working in industry for various companies in areas of complex structured and unstructured text, data and program analysis including work in Real Estate (various data feeds, AWS), intellectual property forensics (cluster computing, non-trivial file comparisons, etc.), patent application writing related to topic modeling and market prediction. His work in visualization includes the financial printing industry and visualizations using PostScript/GhostScript, SVG and Python using PIL and MatPlotLib. Other related projects included sentiment analysis (of German comments), use of large data sets (Google patent database, Enron email database, USGS satellite mapping image data, etc.) and automatically inferring and categorizing characteristics of tabular data such that common probability distributions (binomial, Poisson, etc.) could be used to generate similar data for testing.

For more information, contact Dr. Snyder, rsnyder9@ycp.edu, KEC 115, ycp.powersoftwo.org.