In this set of video I’ll talk about some ways of representing and visualizing data that’s used very often when one is looking for power laws This is building towards the idea of the cumulative distribution function and ranked frequency plots. But I am gonna start by talking about a simple histogram first. So let’s say we have some data And in practice you would have a many points But for this example I am gonna imagine we would have only 10 And so we wanna know how are these distributed. Are they distributed according to power law or something else? So a common thing to do when you’re faced with the data kinda get a sense of a big set of data where does it live where are most values is to make histogram So histogram we just count the number, frequency of occurrences, number of occurrences of data points with in certain ranges. So here’s a set of axes Here’s those data points again. And so I chose somewhat arbitrarily 10, 20, 30, 40, 50 And so each sort of range is gonna go from 0 to 3 and thirds, 3 and thirds to 6 and two thirds and so on. So that’s kind of a usual bin size But that’s nothing wrong with that And we are just fine. So let’s think about how we would build up a histogram. Alright so there’s one value that is 5 So 5 would fall in this region. So I would say there’s one in there filling that box 7, 7 is between 6 and two thirds and 10 So there is one in there. 10 and 13 We take the end point to go towards the right bracket and write something. We would have two in this region to between equal to 10 or greater but less 13 and a third So that’s for 10, That’s for 13 And I coup keep going 14 18 is in there 21 is a little to the right of 20 28 32 and 48 goes all the way over here. Ok, so this is a way of visualizing this data set Particular let’s see is there any central tendency What’s the range of values and so on. In practice one really wouldn’t do histogram with this small amount of data. You probably want a lot more. And you wanna think about what size you use for the bins as well. But I’m trying just to illustrate this general idea. So this is the idea behind the histogram. And we will see the next is another way an alternate way of visualizing the same data something called cumulative distribution function. But before we go to cumulative distribution function I realized that one more thing I want to mention about the histogram So we can think of a histogram as measuring in terms of counts. But we can also turn this into a probability by dividing by the number of the data. So never data points. So we have 10 measurements and so I could interpret this as saying There is one value between 3 and third and 6 and two thirds. It can also interpreted as saying there is one in 10 change. If I were to draw from this at random that I would get something in that range or that simply 10 percent of the data falls in here So I could take this. I am dividing by 10 0.1 0.2 And this I might think of as p of x The probability of x. There’s a subtlety with this interpretation this is one of those places where in matters if we’re thinking about the continuous or discrete distribution But in either case we can think of this more or less as the probability that’s a lowercase p Ok so now let’s move on to cumulative distribution functions.

Welcome to our blog!