Statistical Data Management in MATLAB

In the Apr-08-2007 posting, Getting Data Into MATLAB Using textread, basic use of the textread function was explained, and I alluded to code which I used to load the variables names. The name-handling code was not included in that post, and reader Andy asked about it. The code in question appears below, and an explanation follows (apologies for the Web formatting).  Read more...

Data Mining in MATLAB, May 13, 12:02am

Linear Discriminant Analysis (LDA)

Overview  Read more...

Data Mining in MATLAB, May 13, 12:02am

Reader Question: Putting Entropy to Work

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

Putting PCA to Work

Context  Read more...

Data Mining in MATLAB, May 13, 12:02am

Principal Components Analysis

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

Single Neuron Training: The Delta Rule

I have recently put together a routine, DeltaRule, to train a single artificial neuron using the delta rule. DeltaRule can be found at MATLAB Central.  Read more...

Data Mining in MATLAB, May 13, 12:02am

MATLAB Gaining in Popularity!

Every month, TIOBE Software publishes its TIOBE Programming Community Index, a measure of programming language popularity based on a variety of sources. The current list, TIOBE Programming Community Index for July 2009 lists MATLAB as entering the Top 20 list (for the first time, I believe). While no such ordering is likely to be perfect, TIOBE seems to be one of the more comprehensive efforts for this sort of thing.  Read more...

Data Mining in MATLAB, May 13, 12:02am

Getting Started with MATLAB

I am occasionally asked for introductory MATLAB materials. The only posts I've written here which I'd consider "introductory" are somewhat specialized:  Read more...

Data Mining in MATLAB, May 13, 12:02am

Guest Post on Blinkdagger

Readers of this Web log may be interested in, An Introduction to Combinatorics, an article on the perms, randperm and nchoosek functions which I authored as a guest of Blinkdagger. Blinkdagger covers MATLAB programming, among other things, and I suggest you have a look.  Read more...

Data Mining in MATLAB, May 13, 12:02am

A Quick Introduction to Monte-Carlo and Quasi-Monte Carlo Integration

In a surprising range of circumstances, it is necessary to calculate the area or volume of a region. When the region is a simple shape, such as a rectangle or triangle, and its exact dimensions are known, this is easily accomplished through standard geometric formulas. Often in practice, however, the region's shape is irregular and perhaps of very high dimension. Some regions used in financial net present value calculations, for instance, may lie in spaces defined by hundreds of dimensions!  Read more...

Data Mining in MATLAB, May 13, 12:02am

Quasi-Random Numbers

Many computer users, whether MATLAB programmers or not, are familiar with random numbers. Strictly speaking, the "random" numbers most often encountered on computers are known as pseudo-random numbers. Pseudo-random numbers are not actually "random" at all, as they are deterministically generated in a completely repeatable fashion using one of a number of algorithms called pseudo-random number generators ("PRNG", if you want to impress your friends with lingo). Pseudo-random numbers are designed to mimic specific statistical distributions, most often the uniform distribution or, somewhat less commonly, the normal distribution.  Read more...

Data Mining in MATLAB, May 13, 12:02am

L1LinearRession Code Update

The L-1 regression routine, L1LinearRegression, originally mentioned in the Oct-23-2007 posting, L-1 Linear Regression, has been updated. The old version produced correct results, but the new one is more efficient.  Read more...

Data Mining in MATLAB, May 13, 12:02am

MATLAB Image Basics

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

Introduction to Conditional Entropy

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

Validating Predictive Models

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

50,000 Visitors and Counting

At 9:32AM local time today, this Web log received its 50,000th visitor, which I consider a significant milestone. Visitation continues to trend upward, with this month (not yet complete) already exhibiting the highest number of visitors yet. Recently, I was also made aware that this log appears very near the top of some Google search results (which is only one way to measure success). As an example, a posting here is the 2nd item returned when searching for Mahalanobis distance.  Read more...

Data Mining in MATLAB, May 13, 12:02am

Generating Hexagonal Grids for Fun and Profit

Grids are used for a variety of purposes in data analysis, such as division of physical areas into equal-sized units, or for data visualization. Some clustering techniques, such as Kohonen's Self-Organizing Map use grids to organize their internal structure.  Read more...

Data Mining in MATLAB, May 13, 12:02am

Status Update: Mar-2009

This is just a short note to let everyone know that I have been working (finally) to restore the broken links on this Web log. I believe that the source code links have all now been fixed.  Read more...

Data Mining in MATLAB, May 13, 12:02am

Logistic Regression

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

MATLAB 2009a

MATLAB is on the move. Release 2009a brings a number of changes. The function of the random number generators had already begun to change in the base product as of the last release, if you hadn't noticed, and several functions (min, max, sum and prod, as well as several of the FFT functions) are now multi-threaded. This release also witnesses several changes to the analytical toolboxes. Among others...  Read more...

Data Mining in MATLAB, May 13, 12:02am

Introduction to the Percentile Bootstrap

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

Parallel Programming: A First Look

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

Parallel Programming: Another Look

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

Getting Data Into and Out of Excel

Introduction  Read more...

Data Mining in MATLAB, May 13, 12:02am

Rexer Analytics' 2009 Data Miner Survey

I'd like to alert readers to Rexer Analytics' 2009 Data Miner Survey. I urge you to participate by visiting the on-line survey at:  Read more...

Data Mining in MATLAB, May 13, 12:02am

The Judgement of Watson: Mathematics Wins!

Tom Davenport argues in this HBR article Why I'm Pulling for Watson - Tom Davenport - Harvard Business Review that  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Predictive Analytics Innovation

The Predictive Analytics Summit, a relative newcomer to the Predictive Analytics conference circuit, will be held in San Diego on Feb 24-25. At the first Summit in San Francisco last Fall, I enjoyed several of the talks and the networking. This time I will be presenting a fraud detection case study.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Webinar with James Taylor -- 10 Best Practices in Operational Analytics

I'll be presenting a webinar with James Taylor this Wednesday at 10AM PST entitled "10 best practices in operational analytics".  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Predictive Analytics World Early-bird ends Monday

The earlybird special for Predictive Analytics World / San Francisco ends January 31, 2011 which saves you $200 on the conference rate and $100 on any workshop, including my Hands-On Predictive Analytics using SAS Enterprise Miner on March 17th.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Do analytics books sell?

Kevin Hillstrom has a fascinating post on brief, technical ebooks (Amazon singles) sold on Amazon here: Kevin Hillstrom: MineThatData: Amazon Singles. His points: interesting content is what sells. Length doesn't matter, but these ebooks are typically less than 50 pages. Price doesn't matter.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Doing Data Mining Out of Order

I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don't always; data mining often requires more creativity and "art" to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project! But unfortunately data doesn't always cooperate in this way, and we therefore need to adapt to the specific data problems so that the data is better prepared.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

The Power of Prescience: Achieving Lift with Predictive Analytics

I'll be participating in the DM Radio broadcast tomorrow, The Power of Prescience: Achieving Lift with Predictive Analytics Thursday, Feb 23 at 3pm ET. The best practices that we will be discussing include:  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Number of Hidden Layer Neurons to Use

In the linkedin.com Artificial Neural Networks group, a question arose about how many hidden neurons one should choose. I've never found a fully satisfactory answer to this, but there is quite a lot of guesses and rules of thumb out there.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Rexer Analytics data mining survey

Rexer Analytics, a data mining consulting firm, is conducting their 5th annual survey of the analytic behaviors, views and preferences of data mining professionals. I urge all of you to respond to the survey and help us all understand better the nature of the data mining and predictive analytics industry. The following text contains their instructions and overview.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Predictive Models are not Statistical Models — JT on EDM

This post was first posted on Predictive Models are not Statistical Models — JT on EDM  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Analyzing the Results of Analysis

Sometimes, the output of analytical tools can be voluminous and complicated. Making sense of it sometimes requires, well, analysis. Following are two examples of applying our tools to their own output.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Top 5 Posts from 2011

By far, the most visited post of 2011 was the "What Do Data Miners Need to Learn" post from June.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Statistical Rules Of Thumb, part III: Always Visualize the Data

As I perused Statistical Rules of Thumb again, as I do from time to time, I came across this gem. (note: I live in CA, so get no money from these amazon links).  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Yet another "Wisdom of Crowds" success

I was at the Federal Building downtown San Diego for a consulting job, and met some representatives for a life and disability insurance company who were giving away a big-screen HD TV for the individual who came closest to guessing the number of M&Ms (chocolate and peanut butter filled) in a container. Because they do this often, I won't show the specific container they use.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Statistical Rules of Thumb, part II

A while back, Will Dwinnell posted on two books, one of which is one of my favorites as well:  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

What do Data Miners Need to Learn?

I've been asked by several folks recently what they need to learn to succeed in data mining and predictive analytics. This is a different twist on the question I also get, namely what degree should one get to be a good (albeit "green") data miner. Usually, the latter question gets the answer "it doesn't matter" because I know so many great data miners without a statistics or mathematics degree. Understandably, there are many non-stats/math degrees that have a very strong statistics or mathematics component, such as psychology, political science, and engineering to name a few. But then again, you don't necessarily have to load up on the stats/math courses in these disciplines either.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Statistics: The Need for Integration

I'd like to revisit an issue we covered here, way back in 2007: Statistics: Why Do So Many Hate It?. Recent comments made to me, both in private conversation ("Statistics? I hated that class in college!"), and in print prompt me to reconsider this issue.  Read more...

Data Mining and Predictive Analytics, May 13, 12:02am

Target, Pregnancy, and Predictive Analytics,Part II

This is part II of my thoughts on the New York Times article "How Companies Learn Your Secrets".  Read more...

Data Mining and Predictive Analytics, May 8, 12:00am

Target, Pregnancy, and Predictive Analytics, Part I

There have been a plethora of tweets about the New York Times article "How Companies Learn Your Secrets", mostly focused on the story of how Target can predict if a customer is pregnant. The tweets I've seen on this most often have a reaction that this is somewhat creepy or invasive. I may write more on this topic at some future time (which probably means I won't!) because I don't find it creepy at all that a company would try to understand my behavior and infer the cause of that behavior. But I digress…  Read more...

Data Mining and Predictive Analytics, May 7, 12:01pm

Dilbert, Database marketing and spam

Ruben's comment that referred to spam reminded me of an old Dilbert comic which conveys the misconception about database marketing (e-marketing) and spam.  Read more...

Data Mining and Predictive Analytics, May 5, 12:02pm

Models Behaving Badly

I just read a fascinating book review in the Wall Street Journal Physics Envy: Models Behaving Badly. The author of the book, Emanuel Derman (former head of Quantitative Analsis at Goldman Sachs) argues that the financial models involved human beings and therefore were inherently brittle: as human behavior changed, the models failed. "in physics you're playing against God, and He doesn't change His laws very often. In finance, you're playing against God's creatures."  Read more...

Data Mining and Predictive Analytics, May 4, 12:03pm

Predictive Analytics World Had the Target Story First

The New York Times Magazine article "How Companies Learn Your Secrets" by Charles Duhigg with the key descriptions of Target, pregnancy, predictive analytics (blogged on here and here) certainly generated a lot of buzz; if you are unable to see the NYTimes Magazine article, the Forbes summary is a good summary. However, few know that Eric Siegel booked Andy Pole for the October 2010 Predictive Analytics World conference as a keynote speaker. The full video of that talk is  Read more...

Data Mining and Predictive Analytics, May 3, 12:02am

What I'm Working On

Sometimes folks ask me what I'm doing, so I thought I'd share a few things on my plate right now:  Read more...

Data Mining and Predictive Analytics, May 2, 12:04pm

Why Defining the Target Variable in Predictive Analytics is Critical

Every data mining project begins with defining what problem will be solved. I won't describe the CRISP-DM process here, but I use that general framework often when working with customers so they have an idea of the process.  Read more...

Data Mining and Predictive Analytics, May 1, 12:03pm