Send Close Add comments: (status displays here)
Got it!  This site uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.nbsp; Note: This appears on each machine/browser from which this site is accessed.
Data clustering


1. Data clustering
Clustering is an important technique for grouping data. Some examples include the following.

2. Exact and approximate clustering
Clustering is based on some idea of what it means for two entities to be "equal" or, in most cases, "almost equal" in some sense.

We will look at the following.

3. Human visualization
Humans have a unique ability to abstract and recognize patterns and make abstract inferences from those recognized patterns.

4. Abstraction
Human brains are built for complex abstraction. What does that mean exactly?

In abstract art, something is taken away, something remains, one needs to then interpret what is meant or intended. To abstract is to take away from the essentials and thereby to ignore certain differences.

For most purposes, an abstraction is looking at similarities and ignoring differences. The similarity is what is the same. The difference is what is different.

5. Hoare: Abstraction
In simple terms, abstraction is looking at similarities and ignoring differences.

6. Higher level intelligence
Abstraction is the key to higher level intelligence. That is why so many questions are of the form, "What is the primary similarity and difference between ...".

Much of computer science programming languages involve looking at patterns between text and making abstractions.

7. Seeing and thinking
Kaniza TriangleHow many triangles do you see?

8. Kanizsa triangle
Do you see the triangle?

Kanisza Triangle
There is no triangle! Your brain makes the triangle that you see.

Abstraction involves looking at similarities and differences and filling in missing details - sometimes appropriately, sometimes inappropriately.

9. More triangles
How many triangles do you see now? Do they exist?

Kanizsa Triangle
To many, the triangle that is seen but does not exist and appears brighter than the surrounding area.

10. Gaetano Kanizsa
This type of illusion was discovered/created by Gaetano Kanizsa who popularized such illusions, in part from his 1976 Scientific American paper on the subject (though he had been working on such ideas for many years before this paper).

11. With dots
The triangle is still seen when just dots are present at the corners.

Kanizsa Triangle
Can a triangle that does not exist be "whiter than white"? White is white, right?

12. Necker cube
Necker CubeHere is a Necker cube. Which corner nearest to the viewer?

Necker Cube
Do you see the Necker cube now? It does not exist except in your mind.

13. Programming abstractions
In programming terms, to abstract is to replace one or more parts of a program with a name that refers to the replaced parts (thus hiding the details). Here are some programming constructs that are used for abstraction.

14. Programming language work
Much of the work that I assign for programming language assignments follows the following general pattern. Here is the setup. Here is the requirement. By understanding A and B, including a lot of textual abstracting, one can then write/construct a program to do C.

15. Example programming abstraction
Consider the following pattern (which could go on more in an extended example).
Two times 0 is 0. Two times 1 is 2. Two times 2 is 4. Two times 3 is 6. Two times 4 is 8. Two times 5 is 10. Two times 6 is 12. Two times 7 is 14. Two times 8 is 16. Two times 9 is 18.

A good programmer would immediately visually recognize the pattern and, if asked, could write a simple program such as the following to output that pattern. Here is the C code.

Here is the output of the C code.

A not-so-good programmer would not see the pattern and might attempt the following.

Here is the C code.

Here is the output of the C code.


16. Comparison in general
In both cases, the program has the same output.

But the first program has less redundancy (repetition) and is considered the better program.

In general, the smaller the program that produces the same effect is considered the better program.

17. Comparison in specific terms
Specifically, the better program is smaller but also minimizes any non-computer-checked redundancy.

That means that any parameter or concept that is important in the program and that could be changed should be changeable in one and only one place.

Note: If the computer is checking the redundancy (and not a human) than that redundancy is not necessarily bad. (Backup systems are redundant but useful). And there is redundancy in a program that cannot be avoided (e.g., variables with the same name - when they represent the same or different memory locations).

18. Bloom's taxonomy
Bloom taxonomyBloom's taxonomy of educational objectives has a foundation of knowledge and remembering and a goal of abstract problem solving - evaluation and creating.

19. Music analogy
Think of music, playing a musical instrument, and creating a score of music to play. In computer terms, the musical analogy is as follows.

20. Learning music
How does one learn music? (After listening to music for a while).

A computer (science) programming approach often used is the following to a beginning programmer. Will that work well? How about the following.

21. Dimensions
Humans can easily visualize 2D or 3D in graphics but higher dimensions are harder to visualize.

In data science, one often learns concepts using examples in 2D or 3D and then generalize via abstraction to many more dimensions.

Working in 2D or 3D can thus help one understand the method that then generalizes to higher dimensions.

22. End of page

23. Multiple choice questions for this page