Black Holes, Peach Trees, Towels, and a Keychain

Write a short story based on those four keywords. Seriously (prompt suggested by a friend).

It had been exactly 26 hours, 4 minutes, and 53 seconds since Liam Prescott had last gotten out of bed, and it would be another half-day or so before he would even think about climbing back in.

A time-release caffeine dose, dosage carefully determined over a fortnight before, along with a life-shortening concoction of fatigue and anxiety reducing chemistry—had been self-administered from the moment he woke up, all in the effort to accelerate the completion of an absurd scientific goal: what was now the ninth in a series of failed attempts to bridge the lifeless reality of his underground laboratory to another distant location in the universe.

Well, Liam himself would describe the vibrant, verdant, meticulously constructed and lair as “lifeless,” but any outsider (if any outsider were able find it, that is) would probably jump to tell you that the advanced facility, with its metal and glass interior, dense flora, bright interfaces and alien-looking technology, had to be literally anything but.

Still Liam would now consider it lifeless, as he has for every day for the last twelve years.

Liam swiveled circularly about in his chair and promptly flipped a series of switches whose positions had long been committed to muscle memory. One was labeled “U POWER,” another “LP INTERFACE,” and the last, flashing, red one “IGNITION”.

He opened his day-pack to retrieve two objects in a single motion. The first was a small device, deceptively simple in its controls, and the other was a sort of keychain brooch attached to the device on a loop that looked hastily machined to the top.

Liam slowly folded open the circular, brushed silver brooch to find the last remaining physical memory of his lost soul mate, as he had done all eight attempts before.

“Jane, this is for you.”

The recently appointed Dr. Liam Prescott had never gone shopping for home decor before, but then again, neither had his newly-wedded wife.

Jane Prescott spoke with a voice that was soft but sure, “Liam, I really like all these pretty, modern designs we’ve got, but we could really do with some plants too.”

Liam was not in the business of disagreeing with his beautiful wife, a genius plant biologist in her own right, and likely a much better manager of home Décor.

“What kind were you thinking?”

Jane’s eyes lit up the same way a child’s might when told their curfew has been abolished for New Year’s Eve. “Well, we have plenty of extra space down there. I was thinking of maybe starting a tree garden.”

And so, like clockwork, the couple immediately set off to acquire 50 dozen peach tree cuttings, 51 dozen ceramic pots (in case some broke in transit), 2 metric tons of soil, a lifetime’s supply of fertilizer, and all the necessary components that they would  modify and augment into the cutting-edge system behind the first self-watering, self-picking, self-sustaining superfarm of the Prescott Family Advanced Underground Research and Design Facility.

“Wow.”

“It’s gorgeous, right?” Liam glanced at his wife with bright eyes and then back at their recent accomplishment. “And the trees are pretty neat too, I suppose.”

Jane giggled. “God damn it, Liam.”

Betraying her previous statement, Jane took her new husband by the hand and turned to face the endless rows of peach saplings.

“It’s still missing the final touch.”

Jane opened her bag and started towards the nearest tree. She began to lay a set of meticulously embroidered doilies, each surrounding a single sprout in the superfarm. The widths of their openings were much greater than the trunks they now encircled, but the idea was that they would all eventually grow to fill out their embroidery.

Liam let out one final protest of the strange idea, mostly in jest, “They look like towels, Jane.”

“Oh shut up.”

Each set was meant to detail a significant event in their lives, from early childhood to college to their marriage; it was all carefully recorded in the cloth of the trees. She had hoped, perhaps naively so, that they would eventually get to see all the trees grow in due time, filling out the tapestry of encoded history in homage to their life together. Jane left a few dozen unclothed trees for future special moments to be embroidered.

She placed the final doily; its winding, clumsily sewed threads depicted the scene of the first time the pair had met: two bright, young minds colliding in the MIT Jameson Biophysics Computation Laboratory. They sat on a bench outside the building, sharing the exhaustive list of things they’d hope to accomplish one day.

Jane finished and walked back, and Liam embraced his wife in his arms for the last time.


He closed the brooch, which was filled with a cutting of her face from the lovingly embroidered scene of that fateful encounter, rather unable to handle the raw emotion of the first and last day they had spent together as a wedded couple.

“I’m so sorry, it was my fault.” Liam said aloud, unaware that he had even done so.

The “towels” of the trees had since been mostly filled, and a few had even stretched until they’d snapped. The first broken memory corresponded (not coincidentally) with the first time Liam decided he would attempt the impossible: to retrieve the woman he lost in what could only be described as a bizarre collision of unfortunate circumstance and insane coincidence during a breakthrough-of-the-century scientific experiment—one whose theoretical formalism was only just beginning to be developed by a select body of fourteen appallingly intelligent applied mathematicians and theoretical physicists the world over.

Still, one decade-and-two-years ago, Dr. Liam Prescott had alone managed to uncover the secret to generating negative energy densities, produced the sufficient quantities of exotic matter to contain them, and constructed the necessary bounding apparatus that (he had thought) would be enough to safely assemble and maintain an actual, miniature wormhole for study in the underground depths of the Prescott laboratory. He was almost correct.

But that miscalculation, slight as it was in nature, would go on to haunt his memory forever.

Soon after the elusive Einstein-Rosen bridge had successfully taken shape, Liam at once found himself utterly unable to—for lack of a better term with any remotely understandable analogy—”close” it, before the growing vacuum had already swallowed his better half, along with much of the peach tree farm they’d built together and the facility’s advanced machinery, through its forceful, relentless pull.

Their destination, he had worked out, was a planet orbiting the habitable zone of the red dwarf star system Kepler-186, a distant 151 parsecs from Earth. Despite this astronomically lucky terminus for a living and breathing human being, the survival of his wife after all this time appeared, at best, a distant pipe dream, and one that continued to dwindle by day.

Eight times already he had tried; eight attempts at reproducing the exact conditions to reopen the portal and find Jane. All had failed.

On two or three occasions he even succeeded at the first task, but the automated tools and apparatus of retrieval proved inadequate each time. The personal ticking time-bombs of the remaining embroidery, which continued to stretch and fall apart with the growth of the now unmaintained garden, along with Liam’s own aging body and mind, forced him into a very difficult decision:

He would go into the void himself, body thoroughly covered in a composite protective suit, and attached by intangible forces to his most advanced and powerful iteration of radial servo motor, designed to pull them straight out of the volatile gateway when the deed was done. Liam was going to attempt to save Jane himself.

If he failed, after all, at least they would be together.

Device in hand, keychain brooch dangling, and thoroughly suited head-to-toe, Liam began a series of operations he had mentally ran through countless times already.

The main interface read “ASSEMBLY and CONTAINMENT: T-30 seconds,” and the entire underground laboratory—what was left of it by then, at least—shook about in protest of its final experiment. The three arms of the silicon carbide containment unit emitted exotic forces yet unknown to the greater scientific community, and in an instant, a perfectly targeted wormhole had taken shape in the now trembling apparatus, something he made absolutely sure of this time through months of intense deliberation and calculation.

Liam took one last look at the home they had built and once hoped to spend the rest of their lives together in, before descending into the abyss.

“Jane, I’m coming for you.”

Paradise Is a Kind of Library

What is your favorite quote of all time? What gives it that status?

Being satisfied with my response, I decided to ask her, “So what’s your favorite quote?”

Harper wasted no time in considering her answer. “I have always imagined that paradise will be a kind of library.”

“Huh. What’s that from?”

The Library of Babel by Jorge Luis Borges. It’s about an infinite library of hexes that contains every possible combination of letters on 410-page books.”

“It wouldn’t be infinite, then.”

“Practically it would be.” Harper looked down at the ground for a moment and I reacted by facing my side, indifferently. We turned back at around the same time.

“It’s a wonderful idea, Max.” Her eyes grew wider. “Imagine a collection of literally every possible book. That library would have to contain all of Shakespeare’s sonnets, the Feynman lectures, Beethoven’s unfinished symphony encoded into alphabetic characters, long-form solutions to the Riemann hypothesis, P equals NP, and the twin prime conjecture—every book ever written, and every book that could be written, just waiting to be discovered.”

I paused a brief moment to take that all in. “Most of it would be nonsense, wouldn’t it?” I said, “Pretty much all of it, actually.”

“Yeah and the book explores that. But the nonsense isn’t the point, Max. Everything takes up the same amount of space; there’s nothing to privilege the ‘meaningful’ from the ‘meaningless.’ English isn’t the only way to encode 26 latin characters into words.”

Harper continued nonchalantly, thoroughly ignoring my bewildered expression indicating that this was just about the most interesting rant I’ve ever listened to, “To speak is to fall into tautology. That’s another Borges quote. Words can only ever mean whatever meaning you decide to give them.”

“…So, they’re like life?” I blurted out, “Each word is an infinity of possible interpretations.”

“If you want to be platitudinal, sure.”

I frowned.

“I’m kidding.” said Harper, smiling, “And yes, each word is like an infinity, which makes the Library an infinity of infinities. It’s a tool to show just how absurd the ideas of semantics and meaning are.” Harper looked back at me with a pair of brilliant, amber eyes, presumably finished with her speech.

I carefully matched her eye contact. “It’s beautiful—er… like, the idea… I mean.”

She laughed. “Why thank you.”

Spirographs, Hypotrochoids, and Chaos

Earlier this week, in a course on ancient astronomy, I was introduced to the concept of “epicycles” and their place in Ptolemy’s geocentric theory. In the domain of astronomy, epicycles are basically orbits on orbits, and you can have epicycles on epicycles, and so on…

For some reason, we as a species were stuck on the idea of having only ideal circles as the paths for planets, not realizing that they could also be ellipses. Oh, and we also thought the Earth was the center of the solar system.

By the time humanity at large switched over to Kepler’s heliocentrism, the leading theory had some 84 epicycles in its full description. As it turns out, “adding epicycles” has since become eponymous with bad science—adding parameters in an attempt to get a fundamentally flawed theory to fit increasingly uncooperative data. Go figure.

While the science behind epicycle astronomy is very much false, they do draw out some frankly beautiful patterns if you follow their orbit. The patterns drawn reminded me of spirographs, and, as far as I can tell, for very good reason. A spirograph is really just a special case of an epicycle path, where the outer orbit perfectly matches up with the inner orbit’s rotation speed

Spirograph-esque, no?

I was then wondering if it was possible to generate spirographs myself to mess around with. It turned out to be fairly straightforward: spirograph patterns are equivalent to a mathematical hypotrochoid with spinner distance equal or less than the inner radius, which is basically the shape you get following a point on a circle rotating inside a larger circle.

Hypotrochoids are even more interesting though, since the spinner distance gets to be greater than the inner rotating circle if you want it to be. In fact, you can even set the radius of the inner rotating circle to be larger than the outer radius that it rotates in, or even negative! This isn’t something you could replicate with a real, physical spirograph, but since it’s just a mathematical model living inside my computer, we can do whatever we want with it.

In total, we have five parameters to work with—inner radius, outer radius, spinner radius, revolutions, and iterations. By messing around with the numbers, I’ve got it to produce some utterly insane graphs. I’d be lying if I said I fully understood the path behind some of these, but I’ll try my best to show and explain what I’ve found, and I’ll later propose a mathematical challenge behind the methods that I’m currently working on.

Anyways, let’s take a look at some spirographs (more accurately, hypotrochoids).

Proof Of Concept


The function for a hypotrochoid is traditionally defined parametrically—in terms of x and y. Python lets us define lambda functions easily to model these like so:

x = lambda d,r,R,theta: (R-r)*np.cos(theta) + d*np.cos(((R-r)/r)*theta)
y = lambda d,r,R,theta: (R-r)*np.sin(theta) - d*np.sin(((R-r)/r)*theta)

Let’s set the theta to be in terms of a more intuitive quantity (revolutions):

revs = 30
Niter = 9999
thetas = np.linspace(0,revs*2*np.pi,num=Niter)

Now, “revs” (revolutions) sets the amount of times the inner circle makes a complete rotation (2 pi radians each), while “Niter” (N iterations) is the amount of points we take along the path drawn (the “resolution” of our graph).

As an initial test, let’s set the relevant variables like so:

d = 5 #Spinner distance from center of smaller circle
r = 3 #Smaller circle radius
R = 20 #Larger circle radius
revs = 6
Niter = 5000
The first result of many

What do we have here? Looks pretty spirography to me.

We can change the amount and size of the “loops” by changing the ratios between the two radii and spinner distance. With different parameters:

d = 3 #Spinner distance from center of smaller circle
r = 2 #Smaller circle radius
R = 5 #Larger circle radius
revs = 6
Niter = 5000
The ever-elusive 5-leaf clover

So this is fun and all, but I’ve found that we’re mostly limited to these loopy patterns (I call them “clovers”) if we don’t start messing with the iterations.

Lower the “resolution” of the graph, so we only take points every once in a while, and we can produce much more interesting plots.

d = 5 #Spinner distance from center of smaller circle
r = 3 #Smaller circle radius
R = 20 #Larger circle radius
revs = 200
Niter = 1000
All tangled up!

Cool, right? Check this out, though:

d = 5 #Spinner distance from center of smaller circle 
r = 3 #Smaller circle radius 
R = 20 #Larger circle radius 
revs = 200 
Niter = 1020
???

Holy smokes, what happened here?

Keep in mind that the parameters used for this one are nearly identical to the one before it, differing only in iterations (1020 compared to 1000). That means that the only difference between them is a slight difference in angle spacing between the samples.

A small difference makes a big deal when we’re letting the difference fester over a few hundred rotations. Arguably, this makes the system a rather chaotic one.

Here’s a few more examples of the low-resolution regular interval trick making insane graphs:

d = 5 #Spinner distance from center of smaller circle
r = 17 #Smaller circle radius
R = 11 #Larger circle radius
revs = 160
Niter = 500
This one’s my current personal favorite.
d = 1 #Spinner distance from center of smaller circle
r = 11 #Smaller circle radius
R = 12 #Larger circle radius
revs = 121
Niter = 500
This one has a certain elegance that I’m really digging.
d = 3 #Spinner distance from center of smaller circle
r = 5 #Smaller circle radius
R = 7 #Larger circle radius
revs = 3598
Niter = 629
My friend picked five random numbers for this one. She named it “byssustortafiguraphobia,” after searching for the latin roots that most closely translate to “fear of twisted shapes.”
d = 3 #Spinner distance from center of smaller circle
r = 5 #Smaller circle radius
R = 7 #Larger circle radius
revs = 3598
Niter = 629

If I were to show every graph that I thought was interesting while messing around with this, this webpage would take a very, very long time to load (I’m probably already pushing it). But feel free to check all of them out in my public document here. It includes many that I didn’t find a place for in this post.

Chaotic Systems


Remember that pair of completely different graphs we made earlier, where the only actual difference between the generation was a slight change in angle spacing between the samples? I actually found a lot of those, and the results are pretty wonderful.

But before we look at a few more examples of graphs changing wildly with small changes to their parameters (in my opinion, the coolest situations), here’s a more generic situation. Basically, I just wanted to point out that, most of the time, small changes can only make… well, small changes. See for yourself:

Yes, this is a screenshot of a Facebook Messenger album that was composed of photos of a computer screen that I took using my phone. Sorry, not sorry. It’s mayhem over here.

For reference, the parameters used for those 9 snapshots were:

d = 6 #Spinner distance from center of smaller circle
r = 7 #Smaller circle radius
R = 8 #Larger circle radius
revs = 2000

—where Niter was 452-460. Just take my word for it that this boring sameness happens almost all of the time.

As for the times where it doesn’t happen…

d = 5 #Spinner distance from center of smaller circle
r = 11 #Smaller circle radius
R = 12.6000 #Larger circle radius
revs = 1000

With these parameters, and 200 iterations we get:


Okay, all good. With those same parameters and 201 iterations, we get:


Um, what happened? The simple explanation is that, at 200 divisions of the rotating angle of the inner circle, it happened to be offset slightly, and it created a (relatively) normal spirograph pattern that we’re used to. At 201 divisions though, it just happened to perfectly line up on the same points each time it sampled them around the function. Funky.

Okay, so here’s another, even more insane example. Prepare to have your mind blown.

d = 5 #Spinner distance from center of smaller circle
r = 3 #Smaller circle radius
R = 8 #Larger circle radius
revs = 50050

In the following graphs, the iteration count runs from 997-1003:

Looks like… just some squares? Kinda lame.
Oh my.
The symmetry here makes NO sense. And how does this follow from what we just had?
Looking like a Picasso now.
Yeah. Same function, I swear.
And we’re back to “normal.”

Crazy, right? I won’t pretend to know exactly what went down in this example; the extent of my knowledge is pretty much the combination of my earlier explanations on where the “chaos” of this system comes from and how that means that sometimes small changes make immense differences in the final graph.

Extra Graphs


Before we cap things off, I wanted to show off a final few spirographs that I like a lot.

Here’s one that demonstrates how curved lines can be approximated using a series of straight ones:

d = 1 #Spinner distance from center of smaller circle
r = 4 #Smaller circle radius
R = 5 #Larger circle radius
revs = 100
Niter = 325

The dark blue line is the generating function, and the cyan lines are the spirograph that it makes. As discussed earlier, the spirograph is really just a series of points connected from a generating function.

Curiously, it looks like the spirograph itself maps out another generating function, something that could be found under the same set of rules (a mathematical hypotrochoid)! I’ll leave it up to you to figure that one out.

Here’s another:

d = 20 #Spinner distance from center of smaller circle
r = 69 #Smaller circle radius
R = 63.6 #Larger circle radius
revs = 740
Niter = 2030
Spooky.

I called that one “Doom Hands.” Pretty hellish, right?

Okay, last one.:

d = 20 #Spinner distance from center of smaller circle
r = 69 #Smaller circle radius
R = 61.6 #Larger circle radius
revs = 10000
Niter = 10000
The Homer Simpson curve.

I call that one the “Very Filling Donut” because, well, you know.

Final Notes


So first off, I want to say that I did actually show this to the astronomy professor I mentioned at the beginning of my post. He’s an older fellow who mostly teaches required, main-series intro physics courses (read: uninterested engineering students), so I figured I could brighten his day up by showing him that someone made some pretty cool stuff mostly inspired by what he taught.

I showed it to a few other teachers and friends earlier who liked the idea, but without his approval in particular, the whole effort felt almost incomplete. Of course, being the guy that inspired it, I was expecting him to like it the most. I showed it to him with high hopes.

He didn’t seem all that interested. You win some.

Second, if you’re wondering how I created that cool animation at the top, I used Desmos, an online graphing calculator with surprisingly robust animation functions. Here’s the exact notebook I used (complete with a bonus animation!).

Lastly, there’s actually an interesting class of problems that arises from hypotrochoids that I’ve been working on for a while now, and I’ve had a bit of progress. Take a look at this graph:

d = 1 #Spinner distance from center of smaller circle
r = 11 #Smaller circle radius
R = 12 #Larger circle radius

(The exact numbers for revs/iterations aren’t really important if you just want generating functions/plain hypotrochoids—just make them really large relative to the other numbers. See the first two “clover” graphs at the beginning)

Here’s an interesting question: how many closed regions are in that graph? It’s kind of a lot (P.S. I don’t actually know, since counting them seems like a dry exercise, but have at it if you want to kill a few minutes).

I thought the total amount of regions in this graph was an interesting problem, in the sense that trying to figure it out analytically would be a lot of fun. In clear terms, the challenge would be as follows:

Find a function of the three relevant parameters of a hypotrochoid (i.e. inner radius, outer radius, and spinner distance) for the amount of closed regions that its graph will form.

The problem is stated simply but it’s not trivial to solve (at least for me, strictly a non-mathematician). So far, I’ve figured out that the amount of “loops” L (i.e. the amount of revolutions the inner circle performs before it returns to its exact original state) can be consistently found with this formula:

L = \frac{lcm(r,R)}{r}

—where r and R are the inner and outer radius, respectively, and lcm() finds the least common multiple of its inputs. For certain ideal situations, the amount of loops or the amount plus one is the same as the amount of closed regions, but most don’t follow that trend. They usually cross over each other (like the situation in the graph above), immensely complicating the problem.

What do you think? Have at it, and tell me how it goes.

Introduction to Thermostatistics and Macroscopic Coordinates

A Coarse Description of Physics


Thermostatistics lies at the intersection of thermodynamics and statistical mechanics. Thermodynamics is the study of the movement of heat and energy and the heat and energy of movement. On the other hand, statistical mechanics is a branch of theoretical physics that applies principles of probability theory to study the average behavior of a system when it would be difficult to apply more direct methods (as is often the case in thermodynamics).

Statistical mechanics is kind of remarkable when you consider how people use its conceptual framework—averaging the useful properties of complex systems—on the daily. Here’s an example paraphrased from my study book: imagine going to the drug store to purchase a liter of isopropyl alcohol. For the situation at hand, this simple, volumetric specification is pragmatically sufficient. Yet, at the atomic level, we have actually specified very little.

The container which you actually want is one filled with some 8 septillion molecules of CH₃CHOHCH₃. To completely characterize the system in the mathematical formalism, you would need the exact coordinates and velocities of every atom in the system, as well as a menagerie of variables describing the bonds, internal states, and energies of each—altogether at least in the order of 1025 numbers to completely describe that thing you were able to specify earlier by just asking for “a liter” of alcohol!

Yet somehow, among all those 1025 coordinates and velocities and energies and state variables, every single one, save for a few, is totally irrelevant to describe the macroscopic system. The few that emerge as relevant are what we refer to as macroscopic coordinates or thermodynamic coordinates.

The key to this macroscopic simplicity is threefold:

  1. Macroscopic measurements are extremely slow at the atomic scale of time.
  2. Macroscopic measurements are extremely coarse at the atomic scale of distance.
  3. The scope of macroscopic measurement is just about the scope of what is useful to human beings doing normal human things.

For example, to determine the size of an object too far from you to measure directly, you might take a photograph of the object with a reference scale. The speed at which this measurement takes place is determined by your camera’s shutter speed, an action in the order of hundredths of a second. On the other hand, the kinetic motion and vibration of particles at the surfaces of the object, which are constantly at work altering its observable size, act in the order of 10-15 seconds.

Macroscopic observation can never respond to such minute action. At best, under ideal circumstances, we can consistently detect macroscopic quantities of microprocesses in the range of 10-7 seconds. As such, only those combinations of coordinates that are relatively time-independent are macroscopically useful.

The word “relatively” is an important qualifier here. While we can measure processes in time quite “finely” relative to discernable human experience, it is still far from the atomic scale of 10-15 seconds.

It seems rational, then, to construct a theory to describe all the relationships of the time-independent phenomena of macroscopic systems. Such a theory is called thermodynamics.

In considering the few coordinates that are time-independent, some obvious candidates arise: quantities constrained by the conservation laws, like total energy or angular momentum, are clearly properties that are unaffected by time.

We’ll soon find that there are many more relatively time-independent coordinates dealt with in the broad scope of thermodynamics and thermostatistics.

The Thermodynamic Definition of Heat


Of the ludicrous amount of atomic coordinates, we’ve found that only a few combinations with some unique symmetry properties can survive the merciless averaging associated with transitioning to a macroscopic description. Some “surviving” coordinates prescribe mechanical properties, like volume or elasticity. Others are electrical, like the electric and magnetic dipole moments, various multipole moments, etc. Under this description, we can rewrite broad areas of physics according to which macroscopic coordinates they focus on.

Classical mechanics is then the study of one closely related set of surviving atomic coordinates. Electromagnetism is the study of another set of surviving coordinates. Thermodynamics and thermostatistics, on the other hand, are concerned with those numerous atomic coordinates that, by virtue of the coarseness of macroscopic measurement, are not defined explicitly in the macroscopic description of a system.

To illustrate, one of the most evident consequences of these “hidden” coordinates is found in energy. Energy transferred mechanically (i.e. associated with a mechanical macroscopic coordinate) is called “mechanical work,” and its macroscopic consequences are extensively treated in other areas of physics. Energy that is transferred electrically is called “electrical work,” and so on and so forth.

Notice however, that it’s just as possible for energy to transfer through the motions hidden from macroscopic measurement compared to the ones that are easily observable. Energy transfer that occurs through these hidden modes is called heat.

(Of course, this definition serves only to aid with situating heat within the macroscopic coordinate framework. We’ll soon get a more adequate working definition—basically, a mathematically sound one—to use in our studies)

So what do you say? Are we ready now for some calculations?

Entropy and the Fundamental Laws

Like every field of physics worth its salt, thermodynamics has, at its heart, some fundamental, unchanging principles dubbed “laws.” Thermodynamics has four laws, arguably five. Inexplicably, you’ll find in the literature that they are usually not numbered 1-4 but instead 0-3There’s probably a legitimate reason for this that I’m not bothering to find, but in my defense, how legitimate can a reason really be if it completely defies proper numbering conventions and basic logic? 

No matter. The four laws, in order from, erm… zero to three, are simply stated as follows:

The Zeroth (…why?) Law of Thermodynamics – If two thermodynamic systems are each in thermal equilibrium with a third, then they are in thermal equilibrium with each other.

The First Law of Thermodynamics – Energy can neither be created nor destroyed. It can only change forms.

The Second Law of Thermodynamics – It is impossible for a process to have as its sole result the transfer of heat from a cooler body to a hotter one.

The Third Law of Thermodynamics – As temperature approaches absolute zero, the entropy of a system approaches a constant minimum.

The reason why I noted that there are arguably five laws (except, in this blasted numbering system, it would henceforth be labeled as the “fourth”) is that the all-important ideal gas law isn’t included here. Sure, it isn’t as directly related to energy transfer as the other four, but you’ll soon find that PV=nRT is involved in much more than the trivial algebra of your CHEM 101 course.

Before we go into more detail about the laws and their implications, we need to discuss what is possibly the most important yet nebulous and oft-misunderstood concept in thermodynamics: entropy.

Entropy – A Better Description


You will often hear entropy described as “disorder,” but this description is actually rather misleading; between, say, a cup that’s filled with a bunch of crushed ice cubes and a tall glass of water, the cup of water actually has the higher entropy in the context of its environment, even though you’d be hard pressed to find anyone who would argue it more “disordered.”

Basically, the issue lies in the fact that “disorder” is subjective and does not have a rigorous scientific definition, while entropy, the thing you’re trying to describe it with, does.

So there are numerous more accurate descriptions of entropy available than simply calling it “disorder” and being done with it. In my opinion, the best and most useful description for understanding thermodynamics goes like this: entropy is, at its core, a statistic. It measures the distribution of energy in a system. In particular, it measures the locations that the energy is stored in, quantifying how spread out these locations are.

At the microscopic level, energy is quantized, meaning it is measured in discrete numbers. This means that microscopic energy isn’t continuous like the real number scale, but more analogous to the list of all integers (this also means that dividing energy into “units” as we’re doing is a lot more accurate than you probably just gave me credit for).

We can effectively demonstrate this description of entropy with a simple case study. First, imagine a closed system containing two identical molecules, A and B, that each can store energy. Now suppose that each molecule has 6 distinct locations that all can store an arbitrary number of discrete units of energy, and that there are 8 energy units in total available in the system.

We can now easily see that there are many different possible states the system can take on (e.g. 1 unit somewhere in molecule A and 7 in molecule B). These are called microstates.

Let’s also assume that each distinct microstate is equally likely to occur (e.g. the state which corresponds to 2 units placed in some arrangement in molecule A and 6 in molecule B is just as likely as 4 somewhere in each), and that a microstate counts as distinct if at least one energy unit is placed in a different location.

For example, 2 energy units might be placed in the same spot in molecule A, and so both are taking up one of any of the six possible locations, or each could be in a different spot; every possible configuration of these, multiplied by all the different ways to arrange the remaining 6 energy units in molecule B, would be considered a distinct microstate.

It turns out there are 75582 ways to organize those 8 units of energy in the system we have just described. The distribution works out like this:

Energy Units in A Energy Units in B Possible Microstates Probability
0 8 1287 2%
1 7 4752 6%
2 6 9702 13%
3 5 14112 19%
4 4 15876 21%
5 3 14112 19%
6 2 9702 13%
7 1 4752 6%
8 0 1287 2%

(Caution: the probabilities will questionably add up to 101%, an artifact of rounding).

If every microstate is indeed equally likely, then we can clearly see now which microstates the system will tend towards over time. The states where the energy is more evenly distributed (3 and 5, 4 and 4, etc.) are much more likely to occur than the states at the edges, where the energy is mostly concentrated in one molecule or the other. Entropy quantifies this spread of energy, and since 4 and 4 is more evenly spread than 1 and 7, it would thus have a higher entropy.

It’s important to note that entropy is proportional to the number of possible microstates in any given state. Since there are less ways to arrange the energy units when all 8 are in one molecule than when there are 4 in each, it has a correspondingly lower entropy.

This correlation between possible microstates and entropy implies that higher entropy states are more likely. If we let this system run, after a sufficient interval of time, there would be a 21% chance you will find it with 4 units of energy in each molecule, and a lower chance as you go out.

Interestingly, you may notice that there are cases, though less likely, where the entropy actually goes down. In other words, a case where the energy becomes less spread out. If we start with 5 energy units in one molecule and another with 3, there’s actually a 13% chance that the next time we check on the system, the distribution will become 6 and 2, respectively. In fact, there’s even a 2% chance that all the energy units move to the molecule with initially more energy. If this energy were stored in part kinetically (i.e. temperature, at the microscopic scale) the “hotter” molecule will have just become hotter and the “colder” molecule colder, even though they are completely free to transfer energy between each other!

This is completely counterintuitive to anyone in real life who has ever burned themselves on a hot stove. Hot objects (the pot) will always transfer energy to colder objects (your now burnt hand). Yet, in the system we just described, there’s a 21% total chance it ends up in any of the less entropic states. Certainly you’ve never touched a hot pot only to find that your hand cooled down, right? Ice cubes melt in water until they reach an equilibrium temperature, rooms get messy without deliberate intervention, and hands touching hot pots will be burned. It’s just how things work; energy wants to equalize. What gives?

To put it simply, at a macroscopic scale (the general size range of objects like burnt hands and ice cubes), the amount of atoms—and thus, places where energy can be stored—is so unimaginably high that, when you do the math, the disparity between the higher probabilities of “spread out” states and lower probabilities of states where most or all the energy is concentrated in one location is way too large for it to ever realistically be the case where, for example, your hand cools down upon touching a hot pot.

Let’s go back to our original system of two molecules. Imagine multiplying all the parameters one-thousand fold (still far from the realm of macroscopic systems), so that we now have 6000 distinct locations to store energy in each molecule (now probably more appropriately called an “object”) and 8000 discrete energy units. Now, let’s pose the obvious follow-up question.

Question: If we start from a reasonably uneven microstate—one where 6000 of the energy units are stored somewhere in object A and 2000 in object B—what’s the probability of this microstate evolving such that net energy moves from object B to A instead of the reverse, as we’d expect?

Answer: Just about 0.000000000000000000000000000003%

That’s the beauty of combinatorics: when you mess with big numbers, you’ll quickly find some ridiculous scaling.

Remember that the system which gave us that astronomically low probability is still much, much smaller than a macroscopic system. For reference, your hand is about 0.5% of your body weight, which itself has about 7 billion billion billion atoms. So, we find that a human hand is about 35,000,000,000,000,000,000,000,000 atoms.

Triple that number to estimate the amount of discrete locations where energy can be stored in it, and we can soon see that, compared to the measly 6000 locations which already gave an astronomically low chance of entropy decreasing, the probability that a system of macroscopic objects acts against the second law of thermodynamics, even briefly, is essentially too unlikely to ever occur.

Well, isn’t that remarkable? Heat doesn’t want to transfer to colder objects, as per our original intuition. The simple fact is that energy occupies whatever state it wants to. Its just that, at macroscopic scales, the higher entropy states happen to be overwhelmingly more likely than the lower entropy states by virtue of having more possible microstates. This is where the “disorder” description of entropy comes from; there are simply more ways to be disorderly than orderly. Your hand got burnt because the heat energy of the pan was allowed to evolve in its state by your touching it, and it ended up randomly (as it basically always will) with a higher entropy state of a hotter hand and a slightly colder pan.

Some additional notes: entropy can be quantified, and its quantity is oft-used in thermodynamics calculations. To give you a taste, the formula most commonly given for entropy is Boltzmann’s equation:

S = k ln(W)

where k is the Boltzmann constant equal to 1.38065 x 10^(-23) J/K and W is the number of real microstates.

Back to the Laws


Do you remember the second law? If not, here’s a refresher:

The Second Law of Thermodynamics – It is impossible for a process to have as its sole result the transfer of heat from a cooler body to a hotter one.

Did you catch the error? We know now that this statement is not entirely true. Energy transfers can sometimes occur from cooler bodies to hotter ones in a closed system, provided the system is small enough for it to be reasonably likely.

In macroscopic systems, like a car engine, your hand, or the entire universe, it may as well be true, but let’s now rephrase it to be more precise, using our newfound knowledge of entropy.

The Second Law of Thermodynamics – The entropy of an isolated system not in equilibrium will tend to increase over time, approaching a maximum value at equilibrium.

With that sorted, let’s revisit the implications of the other laws, starting with the first, er… zeroth.

The Zeroth Law of Thermodynamics – If two thermodynamic systems are each in thermal equilibrium with a third, then they are in thermal equilibrium with each other.

Not much to add here. It’s the thermodynamics equivalent of the transitive property—a=c and b=c implies a=b. Though keep in mind what it says about what you can deduce about systems in thermal equilibrium (states that are spatially and temporally uniform in temperature).

Two systems are said to be in thermal equilibrium if they are connected by a path permeable to heat and no heat transfer occurs over time.

The First Law of Thermodynamics – Energy can neither be created nor destroyed. It can only change forms.

So in thermodynamics, you’ll find that the concept of thermal energy and its relation to work is used a lot. Let’s just add a little extra onto this law to acknowledge that fact.

The First Law of Thermodynamics – Energy can neither be created nor destroyed. It can only change forms. The change in the energy of a system is the amount of net energy added to the system minus the net energy spent doing work.

Perfect.

(This can be mathematically represented as U=Q-W, where U is the total change in energy, Q is the heat energy, and W is the energy spent doing work.)

The Third Law of Thermodynamics – As temperature approaches absolute zero, the entropy of a system approaches a constant minimum.

Okay, so there are some interesting implications to this.

First, that it is impossible to reduce any system to absolute zero in a finite series of operations, which basically means that absolute zero cannot be achieved.

This also means that a perfectly efficient engine, which delivers energy in work precisely equivalent to the heat energy put in, cannot be constructed. That’s because the efficiency of a heat engine is reliant on the ratio of the difference in absolute temperature of the working hot and cold sections to the hot section (i.e. unless the cold section has an absolute temperature measuring zero, the ratio, and thus the efficiency, will be less than one).

That’s all for the fundamental laws, and for this section.

Interesting stuff, no? Maybe I was wrong earlier when I said thermodynamics was boring.

Thermodynamics and Thermostatistics

Introduction


It can be argued with some vehemence that thermodynamics is just about the most boring field of study in all of physics. With all due respect to a vastly important and verifiably useful field, thermodynamics is not, at a core conceptual level, a particularly interesting subject.

Many study physics in search of escape—the great unknowns of astronomy or the mind-boggling, unintuitive nature of quantum mechanics. Thermodynamics, on the other hand, doesn’t at all carry that same mythicism.

On the contrary, your first ventures into something like quantum mechanics will yield exceptionally interesting results (as a pedagogical tool, they are often designed to). Even as you complete your first lessons in the field, you come to some truly perplexing and profound conclusions: that things can exist in superposition, our complete theory of the world can never be “realistic” and local (Bell’s theorem), and nature, at the smallest scale, is intrinsically discrete.

When you first finish a long, intense bout of thermodynamic calculations, you can only hope to come to the striking realization of: “Oh, I suppose the system would then heat up by 10 degrees Kelvin.”

Thermodynamics will not attempt to entice you to understand it for the sake of understanding it; you will instead need to have the discipline to work through it simply because it is necessary preliminary for what lies ahead.

To summarize, the problems I and many others have long had with the field (that you should bring upon yourself to be made aware of before proceeding):

  1. Compared to most other fields of physics, it’s conceptually dull.
  2. The mathematics are tedious to perform (as one fellow student once put it, thermodynamics is a “zoo of partial derivatives”).
  3. Thermodynamic systems and calculations describe macroscopic, “real-life” scenarios, unlike the (arguably) more exciting areas of other physics studies.

And so finally, only now that we have been made keenly aware of all the troubles one will likely face as a physics student making the initial venture into thermodynamics, we will study exactly what I warned of for its excess of real-world analogs and unrelenting mundanity.

But hey, maybe that’s your thing.

Contents


A Python Script to Pick Me Outfits Based on the Weather

T here’s a well-supported theory in psychology called “decision fatigue,” which predicts that your decision-making ability goes down as you’re forced to make more decisions throughout a day. As a real life example, in supermarkets, candy and processed snacks are regularly placed near the cash register* to take advantage of your decision fatigue after a long stint of making decisions on which groceries to buy.

*Pictured here: the culprit (Also: an interesting read on decision fatigue’s role in day-to-day life).

On a similar note, there are actually many examples of powerful politicians and businessmen reducing their wardrobes down to a few or even just one outfit in order to minimize the amount of trivial decisions that have to be made throughout a day—think Steve Jobs or Mark Zuckerberg in their simple, iconic garbs.

As former president Barack Obama said of his famously slim wardrobe, “You’ll see I wear only gray or blue suits. I’m trying to pare down decisions. I don’t want to make decisions about what I’m eating or wearing, because I have too many other decisions to make.”

A few years ago, I (still in my early teens) was probably most concerned with two things in my life:

    1. How I looked (and by extension, what I was wearing).
    2. Feeling like I made good decisions (emphasis on “feeling”).

But decision fatigue seems to indicate that these two goals are incompatible; if I really wanted to stop wasting mental energy on picking out an outfit every morning, I should’ve just adopted a uniform to wear daily, like Barack or Steve. When I thought about it though, I really didn’t like the idea of wearing the same thing every day, or even outfits from the same, small set of clothes (i.e. a “capsule” wardrobe). Still, I also wasn’t about to just give up on trying to rid myself of my clothing-based decision fatigue.

The clear compromise was to get my computer to do so for me; every morning, instead of laboring over what to wear, I would load up a program that spits out an outfit or a few on the daily, down to the smallest accessory. I’d follow it without question, and so (presumably) would never again have to painstakingly consider what to put on my body in the morning. This is what passed for a good idea in the mind of 15-year-old me.

The thing is, I actually ended up (mostly) finishing the project. And while I don’t ever really use it anymore, I figured it would be a fun thing to share and pick apart today (The file is here (371 downloads ) if you want it, and the spreadsheet required to run it here (306 downloads ) – You’ll also need PyOWM).

Let’s take a look.

Moods and Weather


Before picking outfits at random, it seems reasonable that we would need to “prune” the potential list based on a few categories. Weather was the first and most obvious; if it was 90 degrees outside in LA, I better not be recommended to wear a parka and ski pants.

Luckily, for the weather, we can use the free Open Weather Map API (OWM) along with a wrapper for Python called PyOWM for easy interfacing with the local weather data. OWM is a commercial API designed mostly for business or agricultural use, so it was fun letting them know exactly what I planned on using my API key for:

I think this image sums up my early teenage years pretty handily.

The other important category was something I called “mood,” which was supposed to be the feeling you wanted out of your clothes that day (outfits could encompass multiple moods).

My four preselected “moods” were:

    1. Cool (Default, daily-driver outfits)
    2. Sexy (For when I’m feeling extra confident)
    3. Cozy (Comfortable and lazy)
    4. Fancy (Anything from business casual to a full on suit-and-tie ensemble)

So the user-input loop would have you select a mood and then automatically find the weather in your city for that day. It would then take the list of outfits at the intersection of those two categories and pick a few at random.

If you’re curious, the input loop looked like this:

while 1:
    category_choice = input("Today I'm feeling... 1)Cool 2)Sexy 3)Cozy 4)Fancy 5|Other Options ")
    if category_choice in accepted_in:
        choice = category_list[int(category_choice) - 1]
        break
    elif category_choice == '5':
        while 1:
            option_choice = input("Other Options: 1)Themes 2)Add an Outfit 3)Force Weather 4)Back to Selection ")
            if option_choice == '4':
                break
            elif option_choice in accepted_in:
                while 1:
                    if option_choice == '3':
                        force_temp = True
                        try:
                            temp = float(input("What is the temperature (high) for today (in fahrenheit)? "))
                        except:
                            ("Oops. Enter a valid number.")
                        break
                #outfitter()
                #themer
            else:
                print("Oops! Enter '1', '2', '3', or '4'")
                continue
    else:
        print("Oops! Enter '1', '2', '3', or '4'")
        continue

(“Outfitter” was supposed to be the function for adding new outfits, but I never got around to implementing that. The “themer” function is in the code, but not put into the loop here. “Force weather” lets you manually set the weather, if you want to wear cold weather clothes in hot weather for some inexplicable reason)

And the PyOWM “weather finder” looked like this:

#Takes input in degrees and outputs the array truncated by temperature. Prints the weather category.

def select_by_degrees(degrees,categ_array):
    if degrees >= 90:
        weather_array = categ_array[categ_array['Weather_value'] >= 3]
        print("Today is hot! (~%.1f F\xb0)" %degrees)
    elif degrees >= 70:
        weather_array = categ_array[(categ_array['Weather_value'] >= 1) & (categ_array['Weather_value'] <= 4)]
        print("Today has fair weather. (~%.1f F\xb0)" %degrees)
    else:
        weather_array = categ_array[(categ_array['Weather_value'] <= 1) | (categ_array['Weather_value'] == 3)]
        print("Today is cold... (~%.1f degrees F\xb0)" %degrees)
    return(weather_array)

(This is what prunes the full outfit array into only those that can “work” in the right temperature. Degrees are in fahrenheit.)

#OWM implementation uses Open Weather Map API to find today's forecast for East LA. Manual input if cannot be accessed (e.g. no WiFi connection).

def select_by_inputs(categ_array):
    try:
        owm = OWM('0d68d0be097dc01d8a14a1ff41785d03', version= '2.5')
        fc = owm.daily_forecast(city, limit = 1) 
        f = fc.get_forecast()
        #print(f.get_weathers)
        w = f.get_weathers()[0]
        #print(w.get_temperature)
        if force_temp == False:
            temp = (float(w.get_temperature('fahrenheit')['max'])+5)
        rain = fc.will_have_rain()
        if rain == True:
            print("Rainy day! Bring an umbrella.")
    except:
        while 1:
            try:
                temp = float(input("What is the temperature (high) for today? (in fahrenheit) "))
                break
            except:
                print("Oops! Enter a number.")
                continue
    weather_array = select_by_degrees(temp, categ_array)
    return list(weather_array['Outfits'])

(This is the OWM implementation, courtesy of PyOWM. The ‘0d68d0be097dc01d8a14a1ff41785d03’ is my API key. I’d recommend that you generate your own key if you download this code to try it, but you don’t have to. An extensive PyOWM documentation can be found at its GitHub at the link above.)

Design Philosophy


I initially toyed with the idea of having each piece of an outfit be put together at random, and even a neural network that learned over time which pieces go with each other. Neither idea seemed very good or tenable, so I went with user-defined, static outfits instead. Each outfit would be stored in a spreadsheet, along with a few variables to define its “categories.”

I had a column for each mood, with ones and zeros telling whether it fit those moods or not, and another column for a “weather value,” which encoded the temperature ranges it could be worn in.

The spreadsheet looked something like this:

Honestly, I’d still wear a lot of these

And the temperature column values break down like this:

0) Cold weather only
1) Cold and fair weather
2) Fair weather only
3) All weather
4) Fair and hot weather
5) Hot weather only

Six values to represent all distinct, reasonable combos of 3 broad types of weather (because clothes that only work in hot and cold but not fair weather isn’t reasonable).

I started adding the option for “themes”—special outfit types that only really work during a specific time: Christmas/Holiday parties, going to a rock concert, blacklight parties, etc. I didn’t really get to adding a lot of themes, but the code is in there and working.

Planned Features


I planned a lot of features that I never actually finished. For example, the ability to add or delete new outfit entries through the program, instead of editing the spreadsheet directly. This was tricky to do using Pandas, the data array editing module I used.

Another feature I wanted to add was a way to count how many times I wore a certain outfit. Then, each time I wore an outfit, it would add to each of the garments’ “wear count.” At the end of every year, I’d find out which clothes I wore the least (or not at all) and get rid of those. Again, editing spreadsheets though Pandas proved difficult, and I never got around to it before I decided that using the outfit selector daily was too tedious.

Even with my planned features, I’m not all too sure about how useful the program would be (To be fair, I also just like the fun of picking out clothes in the morning). The counter for cleaning out your closet annually sounds useful, but you could do so just as easily with a notepad and paper, or even an online note taker. Combined with an automatic outfit selector, though, it may prove to be useful (provided you remember to run it every morning).

But there are just too many situations where I’d want total control of what I wear: an interview, seeing old friends, a house party, a night out, etc., for me to rely on the program to count everything—though things might be different if you could increase the count manually.

On second thought, it’s quite possible I’m wrong about this program and those like it. Download it, edit it, and try it out for yourself. Maybe you can find a practical use for automatic outfit selection where I couldn’t.

As always: Have at it, and tell me how it goes.

3D Printing Molecular Models for the Scientists That Discovered Them

First, a quick life update: for the past two weeks, I’ve been working as an intern for the Chemical and Biophysical Instrumentation Center at Yale. This summer, I’m mostly doing work on software projects, with the primary goal of furthering the open-source NMR initiative OpenVnmrJ.

As a side project, I’ve also been working with their newly acquired 3D printer to create molecular models. It’s a rather good idea in theory: if you could just print real, physical models of complex molecular geometries, it would be a massive step up from a computer screen in terms of visualization.

But as it turns out, 3D printing even the simplest molecules isn’t nearly as easy as slicing the G-code and hitting “print,” and so the center has run into a lot of problems along the way. I was lucky enough to help out with fixing these issues over the summer. For anyone who wants to do the same, I’ll be documenting some common problems and solutions soon.

Dr. Patrick Holland holding… a molecule that I forgot the name of. Oops.

Once we got it working though, we were able to do some pretty awesome stuff. First, I came up with a simple way to edit the mesh generated by Mercury to allow for rotating bonds! This is apparently a pretty important feature that a lot of the (surprisingly large) molecular model 3D printing community has been requesting from the CCDC for quite some time now, and so we’ll likely be publishing our result!

Another great thing we’ve been able to do is actually gift personalized 3D printed molecular models to their discoverersYale chemists, crystallographers, and physicists. It’s been an awesome past few daysgiving sciency gifts to some of the most accomplished people in their respective fields, and I’ve made a lot of new friends along the way.

Dr. Brandon Mercado, the CBIC’s x-ray crystallographer with his fullerene molecule

I can only imagine how surreal it must feel, to study a molecule for months, or even yearsits structure, forces, fields, effects, potential uses, etc.—to then see and feel a tangible model of the thing in your hands. It’s really humbling to have been a part of bringing that to them.

I wanted to show how scientists look when they get to hold their own molecules in model form. I think they’re all adorably happy, and I hope it humanizes them while at the same time reminding us of how much scientists do for the furthering of human knowledge. There’s generally a lot of hype and media attention towards obsessing over science, but not a lot of appreciation for scientists, save for a few big names. I’m hoping this adds to that appreciation.

Cheers to scientists!

Other 3D Printing News


For those interested in all the other stuff I’ve made over this summer, here’s a quick snapshot. I’m sure it won’t disappoint.

One of the first tasks I was given was to repair a set of broken hooks that were once used to close the IR spectrometers. Because of a poor design, both the machines’ hooks had snapped up at the top. See for yourself:

Notice the superglued bit at the top; I put it back together briefly to measure it.

This was clearly a job for 3D printing: A relatively simple and small geometry which we had a physical model for. I took the calipers to the hook and whipped up a simple solution in Solidworks. Here’s what the final model looked like in action:

I added some extra mass to the side where space allowed to ensure that my printed hooks wouldn’t snap like the old ones. There’s also a nub for holding the horizontal metal bar in place, which adds a locking mechanism and a satisfying “click” when you press it in (which has the added benefit of making sure people don’t continue to try and push it after it’s already lockedi.e. how it probably broke in the first place).

Next up, I printed a model of something called a Geneva drive, which translates continuous rotational motion into discrete rotational motion. It’s what they used in old film projectors to move from frame to frame without choppiness. It’s hard to describe how it works in words, so just check it out yourself:

https://gfycat.com/BlaringValidIndri

That famous clacking sound you hear when old-timey films play is actually the sound of the Geneva drive mechanism rotating quickly. Who would’ve thought?

Anyways, this post would quickly reach an unreasonable length if I went over all the neat stuff we printed this summer. To get a sense of it all, here’s a final shot of just some of the things we made:

Yes, that’s a fidget spinner. I regret nothing.

By far, the majority of these objects were either molecular models, different prototypes for the rotating joint, or combinations of the two. I’ll be sure to post on this once our findings are released more officially.

Also, I ordered my own 3D printer to use at home (I think I’m addicted), and I’ll keep you updated on any significant projects I finish involving 3D printing.

And that’s all for now!

Zipf’s Law and Your Facebook Messages

A lot of social media platforms have a bad tendency to give out your personal data to advertisers. Luckily, they’re usually kind enough to give you some of it to too, if you know where to look.

For Facebook, it’s a quick Google search away; you can download a copy through your general account settings. All things considered, it’s a pretty huge wealth of data. Imagine just your messages alone—a rightly massive wall of text containing countless interesting insights about you and the people you talk to. The only catch is you have to be curious enough to sort through it all.

The “obvious” way to analyze text message data is to find out the words you used the most often, and that sounded good enough to me. If you’d like to read about how I actually did the analysis or want to do it for your own Facebook messages, you can check out a download of the script and an explanation of it here (Yikes! It’s still under construction).

Otherwise, let’s take an in-depth look at how I talk to people on Facebook.

Three Hundred Twenty-three Thousand Nine Hundred Forty-seven


—the total number of words I’ve sent through Facebook Messenger since I joined, ignoring punctuation (which has the effect of counting contracted words as one), numbers, emoticons, and generally anything that isn’t an English letter. It’s a pretty stupefying number.

And of course, we’re going to do the old back-of-the-napkin time conversion. The average English word is about 5 letters long, and the average person types English at around 200 letters per minute. These are both pretty dubious estimates since:

a) The average word I’ve sent in a text is probably shorter than the average English word.
b) I’ve sent the majority of my Facebook messages through my phone, which means I also probably typed slower than the average keyboard typing speed.

I’m hoping the errors will sort of cancel out, though, since they work against each other in theory. Anyways, 323947 words times 5 letters per word divided by 200 letters per minute works out to:

8099 minutes or just about 135 hours.

That’s 5.6 days of my life spent just typing Facebook messages. It’s at least a little embarrassing—putting that figure out there.

And remember, that quantity doesn’t even account for the fact that:

  1. Any messages where I sent only an image or an emoji or a sticker (hey, those were popular at one point) doesn’t contribute at all to the final word count.
  2. I’ve probably spent a good additional portion of that time just thinking about what to send.
  3. I’ve deleted a number of conversations with people that I’ve probably written a huge number of messages to.

Before taking a look at my word frequency though, here are some other fun stats:

I’ve sent (at least) 83874 messages with text over the course of 7 years and 10 days, with an average of 3.6 words per text message (excluding outliers–there were enough to make a difference) and 32.7 messages per day. Using the above estimate of 135 total hours, I’ve spent about 0.2% of all my time since I joined Facebook typing messages on their service (Yowza!).

I’ve sent 22848 messages to my “most talked to” person, which accounts for just about 27% of all my sent messages. 

Word Frequency


So, which words did I use the most often?

Just so you know, before I did the analysis, I removed any “stop words” (i.e. common, uninteresting words like “but,” “the” or “and”).

With those out of the way so we can see my own unique word habits, here are my 50 most used words and their usage amounts in order of frequency:

['u' '2481']
['think' '1472']
['oh' '1323']
['know' '940']
['shit' '834']
['ill' '823']
['time' '823']
['fuck' '811']
['lmao' '808']
['idk' '784']
['want' '780']
['probably' '771']
['actually' '750']
['lol' '745']
['ur' '667']
['w' '657']
['kinda' '632']
['going' '609']
['thing' '604']
['bc' '596']
['nice' '595']
['thanks' '576']
['haha' '555']
['rip' '552']
['rly' '547']
['dude' '545']
['bad' '542']
['hey' '538']
['need' '532']
['tho' '525']
['people' '524']
['yes' '509']
['feel' '505']
['make' '504']
['wow' '491']
['didnt' '483']
['youre' '477']
['lot' '472']
['pretty' '470']
['better' '459']
['guess' '445']
['wait' '437']
['day' '432']
['today' '428']
['maybe' '414']
['tomorrow' '403']
['things' '403']
['theres' '403']
['man' '399']
['fun' '390']

So here’s what I’ve first noticed from this list, but tell me if you find anything else interesting:

“U” is my most common word, and the way I use it isn’t even valid English. Actually, “you” was in the list of stopwords I removed, but I wanted to keep abbreviations and other quirks since they help define my own personal texting patterns.

I use “think” a lot more than “know,” so maybe I make more claims in messages that I can’t back up. It’s possible a “think-to-know ratio” could be a decent word statistic to measure someone’s confidence in their ideas.

Two profanities (“shit” and “fuck”—classy) made it to my top ten, which should make sense to us; they aren’t in any list of stop words and, all things considered, they’re pretty versatile words.

Ten out of fifty (20%) of my top words are abbreviations (“u,” “idk,” “lmao”). I don’t know how that compares to the texting average, but it seems fairly reasonable.

Do my top words change based on the people I’m talking to? Well, you be the judge.

Here’s the list of my top words to a longtime good friend of mine:

['u' '189']
['haha' '135']
['shit' '120']
['think' '79']
['kinda' '74']
['idk' '67']
['oh' '65']
['dude' '59']
['fuck' '58']
['probably' '58']
['rly' '52']
['alright' '45']
['lmao' '45']
['ur' '44']
['know' '39']
['time' '38']
['want' '33']
['whats' '33']
['didnt' '32']
['ill' '31']
['way' '30']
['w' '29']
['rn' '29']
['feel' '28']
['theres' '26']
['thing' '26']
['things' '26']
['thought' '25']
['bc' '25']
['man' '24']
['wait' '24']
['high' '23']
['maybe' '23']
['need' '23']
['actually' '23']
['lot' '23']
['p' '22']
['guess' '21']
['nice' '21']
['huh' '21']
['said' '21']
['holy' '20']
['n' '20']
['say' '20']
['tho' '20']
['read' '19']
['mean' '19']
['make' '19']
['abt' '19']
['better' '19']

Compare it to the list of my top words to an acquaintance, who I’ve talked to just a few times in real life.

['u' '55']
['shit' '30']
['w' '22']
['hi' '21']
['actually' '20']
['idk' '19']
['probably' '19']
['math' '18']
['hey' '18']
['thanks' '18']
['want' '18']
['ill' '17']
['home' '16']
['know' '16']
['bc' '16']
['time' '16']
['wow' '16']
['fuck' '15']
['oh' '14']
['think' '14']
['balm' '14']
['yep' '13']
['hello' '13']
['tho' '13']
['ur' '13']
['class' '12']
['rip' '12']
['yes' '11']
['theres' '11']
['lol' '11']
['tomorrow' '11']
['need' '11']
['doing' '10']
['thing' '10']
['p' '10']
['phys' '10']
['lmao' '10']
['kinda' '10']
['lot' '10']
['nice' '10']
['wait' '9']
['didnt' '9']
['maybe' '9']
['fine' '9']
['k' '8']
['mm' '8']
['stop' '8']
['check' '8']
['rn' '8']
['work' '7']

Also, the words I’ve sent the most to all guys compared to girls…

Guys:

['u' '858']
['oh' '488']
['think' '462']
['lol' '403']
['fuck' '389']
['know' '352']
['actually' '340']
['wow' '322']
['ill' '313']
['ur' '286']
['time' '279']
['want' '267']
['going' '256']
['shit' '251']
['idk' '239']
['hey' '238']
['thing' '234']
['probably' '233']
['wait' '232']
['people' '220']
['need' '203']
['wtf' '200']
['make' '199']
['pretty' '199']
['tho' '193']
['kinda' '181']
['guess' '178']
['w' '176']
['rip' '176']
['yes' '174']
['didnt' '169']
['bad' '169']
['thanks' '165']
['better' '163']
['theres' '162']
['hes' '158']
['lmao' '157']
['dude' '156']
['said' '155']
['omg' '154']
['tomorrow' '148']
['youre' '147']
['lot' '145']
['mean' '145']
['game' '145']
['look' '142']
['p' '141']
['maybe' '140']
['nice' '139']
['say' '135']
['math' '133']

Girls:

['u' '1653']
['think' '1020']
['oh' '841']
['lmao' '684']
['know' '594']
['shit' '586']
['idk' '549']
['time' '548']
['probably' '544']
['want' '513']
['ill' '511']
['bc' '498']
['w' '497']
['nice' '456']
['kinda' '454']
['rly' '448']
['haha' '441']
['fuck' '428']
['actually' '419']
['thanks' '418']
['feel' '409']
['dude' '389']
['rip' '387']
['ur' '384']
['thing' '379']
['bad' '377']
['going' '355']
['lol' '349']
['yes' '336']
['tho' '334']
['youre' '332']
['need' '331']
['abt' '330']
['lot' '330']
['day' '324']
['today' '322']
['didnt' '320']
['things' '315']
['make' '312']
['people' '307']
['better' '305']
['hey' '301']
['fun' '289']
['yep' '288']
['b' '282']
['pretty' '275']
['man' '274']
['maybe' '274']
['alright' '271']
['ppl' '270']
['guess' '270']

There’s not too many differences, but I did notice that “thanks” is ranked a lot higher for girls than guys.

Also, “today” is in the top 50 for girls but not guys, whereas “tomorrow” has the opposite case. Bros before hoes? Not for me, I guess.

Zipf’s Law


One really interesting phenomena is that, taken as a whole, the total distribution of my word counts (i.e. including stop words) in order of frequency appears to closely follow a trend according to something called Zipf’s law.

The quick rundown is that Zipf’s law is a quirk of probability distributions first formalized by George Kingsley Zipf, who found that a lot of data sets in the physical and social sciences can be approximated by just one distribution—called a Zipfian.

An easy way to describe a Zipfian distribution is that it’s logarithmic (i.e. if you plot it on a log-log scale, it’ll look linear). Zipf’s law thus predicts that, given some long-form text written in a natural language, the frequency of words used will be inversely proportional to its rank (i.e. 1st is twice as common as 2nd, and three times as common as 3rd, etc.).

In our case, the data set of my most used words can be considered a social science data set, and it’s astonishing how close it follows the distribution. Take a look for yourself:

My top 5000 words in order of frequency.

vs.

An actual Zipfian, randomly generated by someone online.

Interestingly enough, Wikipedia also published a graph of their own word frequencies, taken from a data set of 10,000,000 words randomly chosen from their own archives; they’ve noticed it also closely follows a Zipfian, enough to include the graph on their article about Zipf’s law. How meta!

Wikipedia’s Zipfian word distribution (It looks different since it’s plotted on the log-log scale).

vs.

Our own data plotted on log-log for comparison.

So, how about that? Some guy who was alive a century ago predicted how often I use words in text messages.

Over the course of some 7 years of texting, I’ve managed to produce a data set that matches Zipf’s prediction. I think that’s really something special.

What other interesting things can you find in text message analysis (either from looking at trends in my data set or your own)? I’m sure there are insights I’ve missed, and interesting comparisons I didn’t try.

There’s also so much more to review than just word frequency like I did here. Have at it, and tell me how it goes.

Bonus: an adorable Zipfian dino—source unknown.

Confused Words

Lots of English words have more than one distinct definition. You probably already know these words as homonyms, but we’re going to be focusing on homographs today.

Homographs are those words that are spelled the same way but have multiple meanings, regardless of the pronunciation. So for example, lead metal or lead as in “to conduct” would still count, while they generally wouldn’t under the homonym definition. This is an important distinction to make since the medium we’ll be using is exclusively in writing.

A diagram illustrating the differences between the terminology.
Here’s a diagram illustrating the differences between the terminology. Notice which sections the line for “homograph” covers.

There are the obvious ones, like “baseball ‘bat’” and a “‘bat’ with wings,” but I like the weirder examples that might not come to mind right away. “Parabolic” can mean “of, or relating to a parabola” or “of, or relating to parables” (you know, like the short stories).

Now here’s the fun part: make a sentence that confuses the two, so that you can’t tell which definition it’s using. Let me try first, as a quick example:

The story was parabolic.

Okay, this one’s a bit of stretch, but hear me out. The obvious way to interpret it is to think of the sentence as describing a story that’s a bit like a parable. So, it’s parabolic. On the other hand, most people actually use “parabolic” to describe the math curve, and it tends to be the much more popular definition.

So a person who doesn’t know that “parabolic” can also describe the type of simple stories embedded with a moral message might confuse it as a sort of metaphor for the story’s structure: “That story started and ended at the same level of intensity, with a peak in the middle.”

You can make the word confusion I used more obvious with some context (read it like a pretentious book review):

The story is reminiscent of those ancient ones that attempt to illustrate a moral or spiritual lesson by the end, complete with a steadily increasing pace until it reaches its poignant and climactic peak, where it mellows quickly at first, slowing down abruptly as it approaches a satisfying resolution. You may describe it as parabolic.

You can also confuse it between two different nouns by using a unclear pronoun like “it”:

“That curve you drew looks like the path of a ball that was thrown through the air, and the book you put next to it reads like a Gospel story used to illustrate a moral lesson to kids.”

“Yeah, it’s parabolic.”

So that’s all well and good. Now let’s try it again, this time with another homograph.

Following our trend of using math curves, we’ll use “hyperbolic” this time, which can either mean “of or relating to a hyperbola” or “of or relating to hyperbole.” This one is better in the sense that both definitions are used pretty often, but worse in that it seems a little more difficult to come up with a sentence that can confuse them. There isn’t really a metaphor that applies for the math curve definition like last time, so we’ll have to take it literally.

I wrote a context that I think works pretty well, though, and in an interesting way:

“Check out what I wrote to describe this cool property of a math thing I graphed today.” Tom spoke to his friend with a calculated, relaxed manner.

Tim was always rather skeptical, but he still read onwards: “Relating to the set of functions mapping a symmetrical, open curve formed by the intersection of a circular cone with a plane at a smaller angle with its axis than the side of the cone.”

Tim retorted, “That’s way too much description for a simple maths concept. Why, it’s practically the definition of hyperbolic!”

Did you catch that? The meaning of “hyperbolic” changes based on how you actually read the last sentence, which breaks down into what words you chose to stress. Compare:

“It’s practically the definition of hyperbolic!”

vs.

“It’s practically the definition of hyperbolic!”

Isn’t that cool?

So now we have a total of three fairly distinct ways to confuse adjective homographs. We can confuse them with multiple possible descriptions, nouns, or words stresses. There are probably more distinct and interesting ways, so let me know if you find any.

Okay, what else? As a challenge, we could try to confuse more than two distinct definitions. The list of homographs with three or more distinct definitions is much shorter than just two (the words themselves tend to be, too), and even shorter if you want them all in the same parts of speech (previously, they were all adjectives).

But here’s a word that works—one that you’ve probably used before but might not think of as a homograph: row.

“Row” can mean a line of things, the act of moving an oar to move a boat, or a noisy fight (in which case it’s pronounced in a way that rhymes with “how.” Remember how we defined homographs earlier).

So this might be tricky, since “rowing” a boat is a verb and the other two are nouns. I’ll try my best:

There’s been a great deal of debate over the exact cause of the HMS Legendary’s recent disappearance in the north Atlantic Ocean. Many postulate that it was in fact the nasty altercation that broke out between crewmembers over the proper rowing technique; the famous lines of dozens of rowers—once renowned for their skill and teamwork—found themselves bitterly divided over how they should conduct themselves, which rendered their great ship immobile against the tidal threat.

If perhaps the crewmembers simply got along better, or they all decided on the same oar technique, or they were arranged in a non-linear formation that allowed them to move effectively despite the disagreement, then the ultimate fate of the ship could have been avoided.

In my opinion, only one thing is for certain: the row was at fault.

Hungry for more? Well, if we allow slang definitions, we can actually squeeze in at least five distinct meanings for “pot.” It can mean a cooking pot, a flower pot, the sum of bets made in gambling (“he raked in the pot”), marijuana, or a toilet.

I’m not even going to try to be creative in confusing these, but here goes:

George was quickly closing in on Jonathan’s location. They were both problem gamblers, but George had real power in this city. Jonathan? Just some chump who managed to cheat and swindle a load of reefer off of a dangerous dealer and his cronies. They found out, and now they’re after him.

Jonathan thought he would be safe, though. After all, he had plausible deniability; the drug was hidden inside an incredibly elaborate setup. The bag had been placed under the soil of a flower vase, which itself was in a sealed cooking vat filled with packing peanuts, and the entire setup was submerged in the toilet of an outhouse next to his mobile home. “Even with dogs, there’s no way he’d find it,” thought Jonathan, while hiding away his prize.

The dealer arrived in a white pickup truck along with four of his men. He slowly climbed out of the car, and then, much to the surprise and chagrin of Jonathan, exclaimed, “Don’t even try. I know exactly where yousa hiding it.”

He pointed to the outhouse. Hand over the pot.

So there you have it—five distinct definitions of pot contextually confused with each other. Its convoluted, sure, but also pretty hilarious. Come up with more, I guess. That’s all I got for now.

As always: Have at it, and tell me how it goes.

Navigation