# Probabilidad

### Text-only Preview

LECTURE NOTES

Course 6.041-6.431

M.I.T.

FALL 2000

Introduction to Probability

Dimitri P. Bertsekas and John N. Tsitsiklis

Professors of Electrical Engineering and Computer Science

Massachusetts Institute of Technology

Cambridge, Massachusetts

These notes are copyright-protected but may be freely distributed for

instructional nonproﬁt pruposes.

Contents

1. Sample Space and Probability

. . . . . . . . . . . . . . . .

1.1. Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2. Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . .

1.3. Conditional Probability

. . . . . . . . . . . . . . . . . . . . .

1.4. Independence . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5. Total Probability Theorem and Bayes’ Rule

. . . . . . . . . . . .

1.6. Counting

. . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7. Summary and Discussion

. . . . . . . . . . . . . . . . . . . .

2. Discrete Random Variables

. . . . . . . . . . . . . . . . .

2.1. Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . .

2.2. Probability Mass Functions

. . . . . . . . . . . . . . . . . . .

2.3. Functions of Random Variables . . . . . . . . . . . . . . . . . .

2.4. Expectation, Mean, and Variance . . . . . . . . . . . . . . . . .

2.5. Joint PMFs of Multiple Random Variables . . . . . . . . . . . . .

2.6. Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . .

2.7. Independence . . . . . . . . . . . . . . . . . . . . . . . . . .

2.8. Summary and Discussion

. . . . . . . . . . . . . . . . . . . .

3. General Random Variables

. . . . . . . . . . . . . . . . .

3.1. Continuous Random Variables and PDFs

. . . . . . . . . . . . .

3.2. Cumulative Distribution Functions

. . . . . . . . . . . . . . . .

3.3. Normal Random Variables . . . . . . . . . . . . . . . . . . . .

3.4. Conditioning on an Event

. . . . . . . . . . . . . . . . . . . .

3.5. Multiple Continuous Random Variables

. . . . . . . . . . . . . .

3.6. Derived Distributions . . . . . . . . . . . . . . . . . . . . . .

3.7. Summary and Discussion

. . . . . . . . . . . . . . . . . . . .

4. Further Topics on Random Variables and Expectations . . . . . .

4.1. Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2. Sums of Independent Random Variables - Convolutions

. . . . . . .

iii

iv

Contents

4.3. Conditional Expectation as a Random Variable . . . . . . . . . . .

4.4. Sum of a Random Number of Independent Random Variables

. . . .

4.5. Covariance and Correlation

. . . . . . . . . . . . . . . . . . .

4.6. Least Squares Estimation

. . . . . . . . . . . . . . . . . . . .

4.7. The Bivariate Normal Distribution

. . . . . . . . . . . . . . . .

5. The Bernoulli and Poisson Processes . . . . . . . . . . . . . .

5.1. The Bernoulli Process . . . . . . . . . . . . . . . . . . . . . .

5.2. The Poisson Process . . . . . . . . . . . . . . . . . . . . . . .

6. Markov Chains . . . . . . . . . . . . . . . . . . . . . . .

6.1. Discrete-Time Markov Chains

. . . . . . . . . . . . . . . . . .

6.2. Classiﬁcation of States . . . . . . . . . . . . . . . . . . . . . .

6.3. Steady-State Behavior . . . . . . . . . . . . . . . . . . . . . .

6.4. Absorption Probabilities and Expected Time to Absorption

. . . . .

6.5. More General Markov Chains . . . . . . . . . . . . . . . . . . .

7. Limit Theorems . . . . . . . . . . . . . . . . . . . . . . .

7.1. Some Useful Inequalities . . . . . . . . . . . . . . . . . . . . .

7.2. The Weak Law of Large Numbers . . . . . . . . . . . . . . . . .

7.3. Convergence in Probability . . . . . . . . . . . . . . . . . . . .

7.4. The Central Limit Theorem

. . . . . . . . . . . . . . . . . . .

7.5. The Strong Law of Large Numbers

. . . . . . . . . . . . . . . .

Preface

These class notes are the currently used textbook for “Probabilistic Systems

Analysis,” an introductory probability course at the Massachusetts Institute of

Technology. The text of the notes is quite polished and complete, but the prob-

lems are less so.

The course is attended by a large number of undergraduate and graduate

students with diverse backgrounds. Acccordingly, we have tried to strike a bal-

ance between simplicity in exposition and sophistication in analytical reasoning.

Some of the more mathematically rigorous analysis has been just sketched or

intuitively explained in the text, so that complex proofs do not stand in the way

of an otherwise simple exposition. At the same time, some of this analysis and

the necessary mathematical results are developed (at the level of advanced calcu-

lus) in theoretical problems, which are included at the end of the corresponding

chapter. The theoretical problems (marked by *) constitute an important com-

ponent of the text, and ensure that the mathematically oriented reader will ﬁnd

here a smooth development without major gaps.

We give solutions to all the problems, aiming to enhance the utility of

the notes for self-study. We have additional problems, suitable for homework

assignment (with solutions), which we make available to instructors.

Our intent is to gradually improve and eventually publish the notes as a

textbook, and your comments will be appreciated

Dimitri P. Bertsekas

bertsekas@lids.mit.edu

John N. Tsitsiklis

jnt@mit.edu

v

1

Sample Space and

Probability

Contents

1.1. Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 3

1.2. Probabilistic Models . . . . . . . . . . . . . . . . . . . . p. 6

1.3. Conditional Probability

. . . . . . . . . . . . . . . . .

p. 16

1.4. Total Probability Theorem and Bayes’ Rule

. . . . . . . .

p. 25

1.5. Independence . . . . . . . . . . . . . . . . . . . . . .

p. 31

1.6. Counting

. . . . . . . . . . . . . . . . . . . . . . .

p. 41

1.7. Summary and Discussion

. . . . . . . . . . . . . . . .

p. 48

1

2

Sample Space and Probability

Chap. 1

“Probability” is a very useful concept, but can be interpreted in a number of

ways. As an illustration, consider the following.

A patient is admitted to the hospital and a potentially life-saving drug is

administered. The following dialog takes place between the nurse and a

concerned relative.

RELATIVE: Nurse, what is the probability that the drug will work?

NURSE: I hope it works, we’ll know tomorrow.

RELATIVE: Yes, but what is the probability that it will?

NURSE: Each case is diﬀerent, we have to wait.

RELATIVE: But let’s see, out of a hundred patients that are treated under

similar conditions, how many times would you expect it to work?

NURSE (somewhat annoyed): I told you, every person is diﬀerent, for some

it works, for some it doesn’t.

RELATIVE (insisting): Then tell me, if you had to bet whether it will work

or not, which side of the bet would you take?

NURSE (cheering up for a moment):

I’d bet it will work.

RELATIVE (somewhat relieved):

OK, now, would you be willing to lose

two dollars if it doesn’t work, and gain one dollar if it does?

NURSE (exasperated):

What a sick thought! You are wasting my time!

In this conversation, the relative attempts to use the concept of probability to

discuss an uncertain situation. The nurse’s initial response indicates that the

meaning of “probability” is not uniformly shared or understood, and the relative

tries to make it more concrete. The ﬁrst approach is to deﬁne probability in

terms of frequency of occurrence, as a percentage of successes in a moderately

large number of similar situations. Such an interpretation is often natural. For

example, when we say that a perfectly manufactured coin lands on heads “with

probability 50%,” we typically mean “roughly half of the time.” But the nurse

may not be entirely wrong in refusing to discuss in such terms. What if this

was an experimental drug that was administered for the very ﬁrst time in this

hospital or in the nurse’s experience?

While there are many situations involving uncertainty in which the fre-

quency interpretation is appropriate, there are other situations in which it is

not. Consider, for example, a scholar who asserts that the Iliad and the Odyssey

were composed by the same person, with probability 90%. Such an assertion

conveys some information, but not in terms of frequencies, since the subject is

a one-time event. Rather, it is an expression of the scholar’s subjective be-

lief. One might think that subjective beliefs are not interesting, at least from a

mathematical or scientiﬁc point of view. On the other hand, people often have

to make choices in the presence of uncertainty, and a systematic way of making

use of their beliefs is a prerequisite for successful, or at least consistent, decision

Sec. 1.1

Sets

3

making.

In fact, the choices and actions of a rational person, can reveal a lot about

the inner-held subjective probabilities, even if the person does not make conscious

use of probabilistic reasoning. Indeed, the last part of the earlier dialog was an

attempt to infer the nurse’s beliefs in an indirect manner. Since the nurse was

willing to accept a one-for-one bet that the drug would work, we may infer

that the probability of success was judged to be at least 50%. And had the

nurse accepted the last proposed bet (two-for-one), that would have indicated a

success probability of at least 2/3.

Rather than dwelling further into philosophical issues about the appropri-

ateness of probabilistic reasoning, we will simply take it as a given that the theory

of probability is useful in a broad variety of contexts, including some where the

assumed probabilities only reﬂect subjective beliefs. There is a large body of

successful applications in science, engineering, medicine, management, etc., and

on the basis of this empirical evidence, probability theory is an extremely useful

tool.

Our main objective in this book is to develop the art of describing un-

certainty in terms of probabilistic models, as well as the skill of probabilistic

reasoning. The ﬁrst step, which is the subject of this chapter, is to describe

the generic structure of such models, and their basic properties. The models we

consider assign probabilities to collections (sets) of possible outcomes. For this

reason, we must begin with a short review of set theory.

1.1 SETS

Probability makes extensive use of set operations, so let us introduce at the

outset the relevant notation and terminology.

A set is a collection of objects, which are the elements of the set. If S is

a set and x is an element of S, we write x ∈ S. If x is not an element of S, we

write x /

∈ S. A set can have no elements, in which case it is called the empty

set, denoted by Ø.

Sets can be speciﬁed in a variety of ways. If S contains a ﬁnite number of

elements, say x1, x2, . . . , xn, we write it as a list of the elements, in braces:

S = {x1, x2, . . . , xn}.

For example, the set of possible outcomes of a die roll is {1, 2, 3, 4, 5, 6}, and the

set of possible outcomes of a coin toss is {H, T }, where H stands for “heads”

and T stands for “tails.”

If S contains inﬁnitely many elements x1, x2, . . ., which can be enumerated

in a list (so that there are as many elements as there are positive integers) we

write

S = {x1, x2, . . .},

4

Sample Space and Probability

Chap. 1

and we say that S is countably inﬁnite. For example, the set of even integers

can be written as {0, 2, −2, 4, −4, . . .}, and is countably inﬁnite.

Alternatively, we can consider the set of all x that have a certain property

P , and denote it by

{x | x satisﬁes P }.

(The symbol “|” is to be read as “such that.”) For example the set of even

integers can be written as {k | k/2 is integer}. Similarly, the set of all scalars x

in the interval [0, 1] can be written as {x | 0 ≤ x ≤ 1}. Note that the elements x

of the latter set take a continuous range of values, and cannot be written down

in a list (a proof is sketched in the theoretical problems); such a set is said to be

uncountable.

If every element of a set S is also an element of a set T , we say that S

is a subset of T , and we write S ⊂ T or T ⊃ S. If S ⊂ T and T ⊂ S, the

two sets are equal, and we write S = T . It is also expedient to introduce a

universal set, denoted by Ω, which contains all objects that could conceivably

be of interest in a particular context. Having speciﬁed the context in terms of a

universal set Ω, we only consider sets S that are subsets of Ω.

Set Operations

The complement of a set S, with respect to the universe Ω, is the set {x ∈

Ω | x /

∈ S} of all elements of Ω that do not belong to S, and is denoted by Sc.

Note that Ωc = Ø.

The union of two sets S and T is the set of all elements that belong to S

or T (or both), and is denoted by S ∪ T . The intersection of two sets S and T

is the set of all elements that belong to both S and T , and is denoted by S ∩ T .

Thus,

S ∪ T = {x | x ∈ S or x ∈ T },

S ∩ T = {x | x ∈ S and x ∈ T }.

In some cases, we will have to consider the union or the intersection of several,

even inﬁnitely many sets, deﬁned in the obvious way. For example, if for every

positive integer n, we are given a set Sn, then

∞

Sn = S1 ∪ S2 ∪ · · · = {x | x ∈ Sn for some n},

n=1

and

∞

Sn = S1 ∩ S2 ∩ · · · = {x | x ∈ Sn for all n}.

n=1

Two sets are said to be disjoint if their intersection is empty. More generally,

several sets are said to be disjoint if no two of them have a common element. A

collection of sets is said to be a partition of a set S if the sets in the collection

are disjoint and their union is S.

# Document Outline

- ÿ
- ÿ
- ÿ
- ÿ

- ÿ
- ÿ
- ÿ
- ÿ
- ÿ

- ÿ
- ÿ
- ÿ
- ÿ

- ÿ
- ÿ
- ÿ
- ÿ

- ÿ

- ÿ
- ÿ
- ÿ

- ÿ
- ÿ
- ÿ