Charity Majors: Paul, have you ever been on call?
Paul Biggar: I've been on call a lot.
When I started Circle, I was on call from the first customers until they finally took my pager away three years later.
Charity: Oh wow.
Paul: I started as a primary on call and then some iteration.
I mean, at the start we didn't have pager duty, and we
didn't have monitoring, which is the precursor to observability. You might have heard of it.
But you know, your Twitter blows up because your
site is down and you're effectively on call regardless of how you position.
Charity: This has never happened to me.
Rachel Chalmers: So, this might be a great time for you to introduce yourself.
Paul: I'm Paul Biggar, I founded CircleCI, and more recently I founded a company called Dark.
Rachel: You and I met back in the CircleCI days.
What drew you to continuous integration in the first place?
Paul: There's a couple of different answers to that, but I remember being in a room
in Paul Graham's House in 2010 when I was doing Y Combinator.
Rachel: Was this at the party?
Paul: Different party.
And we were spitballing. My
Y Combinator start-up was stupid, and he was like, "Why don't you
do compilers as a service?" And I didn't even know what that meant.
But a year, year and a half later after I'd been working at Mozilla
for a while I realized that they have this big problem of
like, the automated testing thing, and the release engineering, all this sort of
thing. I
was not exactly directly involved in it but I was a downstream user of their CI suite
basically, and I spent a year thinking, "You know, if I was in charge, I would do this differently."
Then when I decided to use time to do a start-up this had been on my mind for a year.
Rachel: Tangentially, I kind of want to do a shout-out to Mozilla as kind of the Bell Labs of our industry.
There's so much amazing stuff and so many amazing people coming out
of it, like Rust language, and it's just a generator of cool.
Charity: It does feel like release engineering is something that is very systematically under-invested in
at pretty much every company over the size of 50 people that I can think of.
This is where faults get injected into our system, this is where chaos enters the system, and yet it's not seen as being prestigious work.
It's seen as being very laborious, it's seen as being the crap work
that you do and you have to, not something that actually affects your
life more than any other piece of code you can probably write.
Rachel: Well, there's been this truism on the finance side forever, that dev
tools don't sell and dev tools don't grow into big venture exits.
Charity: That's why we still have Capistrano.
Rachel: It's kind of crazy.
I'm not even sure it's true anymore.
Paul: I kind of think it's true.
I think up until recently the secret to selling dev tools
was selling infrastructure, and so the companies that made money were, with some exceptions.
Like GitHub who just had all the users in the world.
But I think there's--
Charity: Everybody tells you, "Sell to ops. They have
budget, they have checkbooks. Devs
don't."
Paul: I mean, almost every dev tool that's been successful, if it hasn't
been selling infrastructure it's been selling top-down to enterprise.
Charity: That does seem to be changing.
Paul: Yeah, I think so.
The number of people in the industry, who are the people
who are coding in the industry, is rising at this astronomical rate.
Charity: And the tools are getting better. I
remember when I left Facebook and I realized that you can now
cobble together the same exact build-to-play pipeline using all of these smaller
start-ups, almost all of which have been found in the last five
years.
Rachel: Right, that's one thing I see happening. A
lot of the tools that are invented inside huge companies like Mozilla, and Google and Facebook.
People leave and then they do these start-ups, and suddenly you have this accessible tool chain.
Charity: Because they don't know how to live without it.
Rachel: Exactly, you get accustomed to that lifestyle.
Paul: The upside of that is obviously that you have the tool that you can use.
The downside is, you now need to know all these tools, and the complexity.
The industry has been exploding as a result.
Charity: It's true, and there are very few reliable narrators when it comes to how
to plug them together and what you actually need, and what you don't.
Paul: Well, you obviously need to use the tool that the person on stage is telling you to use.
Charity: Well, of course.
Paul: And then some other tools as well that integrate nicely.
Rachel: You've talked a lot about accidental complexity, which I love as
a phrase for describing what's even happened since you founded CircleCI.
It's just skyrocketing number of variables, number of abstraction
layers that people need to get their heads around now.
Do you
want to talk more about that?
Paul: I actually gave a bit of a talk about this at the Honeycomb
meet-up a couple months back, but basically when we started CircleCI people had
a problem and that was that their Rails monoliths took a long time to
test.
Our product was, we take it, we paralyze it, it's great. In-between
then and now, microservices happened.
microservices have been happening for 30 years under different names and so on, but
people actually started doing microservices for the first time in history, I guess.
That completely changed how people tested, it completely changed what CircleCI's product is.
It also, I think, has had a complete change on the industry, even how
people think about their code bases and splitting them across multiple--
Charity: And their teams, like the organizational structure. I
think it's had a huge--
Rachel: And what they're responsible for, used to be a sys-admin.
Like, "These hundred servers are mine.
No one may touch them." And
now, what is it that you own? What is it that you're measured on? How do you define success in that role?
Paul: There isn't a right answer to any of it. There's a couple of opinions.
Charity: It's dizzying.
Everybody has advice for you, but it's always what they've seen work once.
Rachel: Right. Confirmation
bias, "I did it this way and I succeeded, therefore the only way to succeed is to do
it this way, in spite of the 99 other people who did it that way and failed."
Charity: Yeah.
Paul: We have a very fashion-oriented industry.
Rachel: We do!
Paul: Whoever writes the blog posts that gets the most likes is the thing that becomes best practice.
Charity: The one that actually made sense to the most people.
Paul: Optimistically.
Rachel: Well, makes sense, or appealed to this week's aesthetic.
Charity: That's true.
Paul: Or is written by the famous person.
Rachel: Paul, to found one startup may be regarded as a misfortune, to found two smacks of carelessness.
Where did you find the courage to start Dark?
Paul: Oh my God.
So, this is my fourth startup.
Rachel: Four?!
Paul: The first two failed tragically.
Rachel: Do you need help? Is there something we can do?
Paul: So, apparently on the third one you start to get okay at it.
The first one after you make a successful one, they'll give you money without really too much work.
I actually had this sort of thought, I spent a lot of the intervening
time between when I left Circle and when I founded Dark thinking, "What will I
do with my life?" And I had a lot of ideas that were mostly
not venture-backed, that were mostly small, low stress start-ups that you could sort of have
a nice chill life but still have have meaning and work, and that kind
of thing.
That's not what I did, because every time I started thinking about, "How would I build those?"
I realized that the tool that I wanted to build them with did not exist.
Charity: This is how Parse got started too, you know.
They were going to build mobile apps and then they suddenly went, "Oh
my gosh, everybody is doing all of this every time?"
It became Parse, because there's
so much, just, boilerplate that you have to redo every time and it's tiresome.
Paul: Yeah. You
go to a hackathon for the weekend, and at the end you've got your web pack pipeline set up.
Charity: Yes, exactly.
Paul: So, our goal with Dark is very much like, reduce background work.
Charity: Yeah.
Paul: The reason I talk about accidental complexity so much is, our goal is basically
just putting a circle around all the accidental complexity that we can find and
seeing if we can remove it in a sort of a holistic back-end package.
Charity: Tell us what Dark is.
Paul: Dark is a tool to make coding a hundred times easier.
Specifically, to make back-end services easier.
So, you would go to Dark, you would use our editor, you would use our infrastructure compiler. And
you would use our language, the Dark language.
Because you're using all this holistic stuff, you get a lot of stuff for free and that's basically what we're doing.
Charity: How do you know if it's working?
Paul: That is a very, very good question.
Charity: Interesting.
Paul: We're about six months into the development of it, maybe--
Charity: I meant, how do you know if your software is working?
Paul: Oh, how do you know if your software is working? Well, Types, Charity.
Charity: Burn.
Paul: One of the things that we're making sure that we do with Dark is
we're not making any new things, we're just bringing them all together.
So, the things that people use today to make their
software work, Types, Fuzzing, testing, continuous integration. They're all part of it.
Charity: I think of all that as being the basics, right? I'm trying to gently nudge you into mentioning observability here.
Paul: Oh, I see.
So, actually Dark is really centered around the idea, or at least the concept I think, of observability. Because
you're always writing in production.
Charity: Love it.
Paul: There is no separation of the code.
There's no process to take the code from your laptop into production.
Charity: All of those places are so fraught with errors in things that get dropped, which is why I love it.
The best software engineers I ever worked with at Facebook would spend half their day
in their IDE writing code, and they wait for it to eventually make its way out to
production, and then they spend the other half of their day in Scuba or ODS, just trying to understand
the consequences and effects of what they had shipped, or what their intern had shipped.
Because the understanding becomes the hard part much more than the development part.
Paul: When you think about how hard it is to replay a bug that a user had on your site--
Charity: Yes.
Paul: You're going to have to replay it through several microservices, and fetch it from
different logging mechanisms, and inevitably you're going to be missing something anyway.
Charity: This, to me, points to why it's so necessary that we get
comfortable with testing in production. Which is very much a Dark-friendly concept.
Paul: Absolutely. I
totally believe in it.
Charity: I see teams flushing all this time and energy just down the toilet, trying to get staging in sync with production.
Which is actually, in fact, impossible, because every single time you deploy
an artifact using a deploy script to a production that's a new thing. Right?
Paul: Yeah.
Charity: You can capture and replay the past, but you can't predict the future.
So, whatever you're doing on staging is inevitably dumb.
Rachel: It's theater.
Charity: It's theater, and it makes you feel good about yourself. We
have limited cycles, and we are spending all of our time
there which means we're not spending it on hardening production.
Guardrails making it so that you can actually see what's going on so that you can slice and dice in real time, so that you can experiment.
Rachel: The guardrails are critical, though.
Paul, how do you think about making sure that testing in production manages failure in a graceful way?
Paul: I think feature flags is probably one of the best tools that we have for that.
In Dark, the way that you do it is that once users are
using a particular route that code is immutable, you can't change that code.
You can't edit it, there isn't a process of going into it and making a change.
What you can do, is you can take a section of it and say, "I'm going to flag that off."
And you can run multiple traffic both ways, and all that sort of thing.
Basically, what we as developers are trying to do is get some personal certainty that the code that we write is going to work.
The
best way to do that is to take real traffic, run it through the
code that we've just written in a safe way, validate that the answers are
correct, whether we're doing some sort of statistical analysis on it or just eyeballing the
result.
Charity: When you put it that way, it's insane that we haven't done this sooner.
Paul: That's my position, too. Thank you, Charity.
Rachel: It seems, though, like it would be very hard for legacy developers, developers with the older mindset, to embrace this.
Charity: I feel like, yes, it is hard for them to embrace it, but I find that
often I have a hard time convincing people how easy it can be, if they just do
the thing they want to do instead of the 10 or 20 steps before the
thing.
This is a problem we have all the time too, where we're like, "No really,
this is hard because you haven't been able to ask the right question.
It's incredibly easy if you can just ask questions with high cardinality and feels." And
it sounds like it's very much the same thing for you guys.
Paul: I think it's very much a case of showing them a demo of what they can do on their own data.
Charity: Exactly.
Paul: Obviously that's not necessarily an easy thing to do.
Charity: Yeah, but it's killer.
Paul: Our industry has a history of these amazing demos.
The world is changing as a result of these demos, and that's sort of what everyone really tries to do.
Charity: Got to show them on their own data, because then they know that you're not making it up, you're not cherry picking.
Paul: The other answer to that, and it's one I'm not particularly partial
to, but the industry grows at such an incredible rate.
The estimate for the number of programmers there are today is upwards of 50 million, and there'll be
new people along all the time and there's still people writing it right in COBOL.
Some of them retired, and some of them went away, and then some of them got bored.
Rachel: COBOL's a great language.
You and your co-founder Ellen publicly committed to diversity, while we're
talking about all of these new coders coming in.
Do you think Dark's culture affects what your code is like, and vice versa?
Paul: Absolutely.
We are are big believers in inclusion.
It is one of our core values.
There's a couple of different reasons for this, and one of them just from a
business perspective is we want there to be a billion developers using Dark.
Obviously we're not going to get there if we don't open it up to way, way more people than are currently coding today.
I think, as well, in the current political climate it's very difficult to not look around and see all
the bad things that are happening and see the related situations in our industry,
and how we've made it not a great place for people of
color or for just generally anyone who's underrepresented in our industry. Non-white dudes, basically.
I guess it's fair to say, though, that we have both a business reason and
a values reason for doing that and it's sort of core to who we are.
Rachel: What's the advantage of getting a billion people using Dark, other than that you make a ridiculous amount of money?
Paul: When Ellen and I started working together, I'd drawn up this sort of values questionnaire,
and I had a lot of, you know, potential co-founders fill it out
and basically, making sure that we're on the same page.
And the page was that we're building something big.
I'm not going to all this effort in order to make a small side project, or whatever.
We're really doing a thing that we believe in, and a thing that we believe
needs to exist in the world, that needs to exist for a lot more
people, and it dovetails with a ton of different things and inclusion is one of
them.
The answer to that question is, you know, "Why would you do it?" It's like, because that's what we wanted to do.
Rachel: Like Trudeau getting asked about all of the women in his cabinet and saying it's 2017.
Paul: Right, exactly.
Charity: We talked a little bit about being on call. A
lot of engineers seem to regard this as a curse, a punishment, a thing that
is being imposed upon them, a thing that has to be avoided at all costs.
What's your view?
Paul: Well, I think one side of it is definitely that people need their sleep, and being
on call is sort of damaging to our sanity, at the core of it.
Charity: There's definitely the flipside.
Ops has a long and sordid history of masochism and we cannot ask people to join us there.
Like, I'm over 30, I now want to sleep through the night too.
We just have to raise our standards for what we are willing to impose on people and participate in.
Paul: I loved the early Stripe story, where, and who knows how true these apocryphal stories
are, but where they set an alarm for every single error they got. Wake
them up in the middle of the night if there was any error at all.
I guess when you're dealing with payments, that's the sort of situation
that you can put yourselves in because you don't want to drop them.
But the idea of, when you keep it clean, then the number of calls that you actually get is relatively low.
And the problem that I feel that people have when they're on call is that
the costs of other people's code gets externalized to them, to the person who's on call.
So, I mean, it's basically like, how much does your company value you? Are
they putting you on call because someone has to be on call? We've made a really, really
good job to make sure that it's as good an experience as possible.
Charity: Our on call experiences, it's a rare week whenever anyone gets what they want.
It's incredibly rare, and we always post-mortem it, and do everything we can to make sure it doesn't happen again.
Paul: Right.
Charity: I've been at many companies where that was the case.
We just expected that you got woken up two or three times a night, you know,
and it's really hard to dig yourself out of that hole once you get into it.
Paul: Right.
Often when people interview, they ask you, "What's the on call going to
be like?" And you can tell just from how they ask what scars they have in the past.
Charity: Oh, trauma. Absolutely trauma. It
does come down to valuing people's time.
I feel like every manager has a responsibility to, if not be on call
themselves, it's not always possible, at least to fucking graph, know when your people are being
woken up and have it impact you and take it seriously.
Give them the time and the permission and the space and the support
to pay down that technical debt so that it's not that bad.
Rachel: It's absolutely about taking responsibility, I think.
You talked about how resentful people get when they're the negative externality of somebody else's lazy code.
The advantage of putting engineers on call is they become responsible for
their own code and they appreciate the consequences of that.
But managers have to be respectful of people's time and of people's ability to affect the outcome.
The real burnout comes from not being able to make meaningful change.
Charity: A lot of engineers, because they're not exposed to that feedback
loop, they don't actually learn how to write good software.
It's not that they're doing it on purpose, they just don't know, because they've never
had that feedback loop of, "Oh, this is what happens when I do that,"
When I have this way of degrading that's not particularly graceful when I don't shrink the critical path.
Paul: I think, you know, coming back to what we were talking about earlier
about microservices and continuous deployment, one of the best things that we can
do to reduce our critical path is lower the diff of what we're shipping.
Charity: More smaller changes--
Paul: And more certainty around what outcomes they're going to have.
Charity: Exactly.
I mean this is just part of distributed systems, right? Failures
happening all the time, and it has to be not that big of a deal.
Paul: No matter what.
Like, some day some shark is going to take a bite of an undersea cable--
Charity: Exactly.
Rachel: Cut off Australia entirely.
Charity: Well, what are developers missing about the future of software engineering and shipping quality code?
Paul: I think our feedback loops have gotten terrible.
Charity: Gotten terrible?
Paul: I mean, maybe they've always been terrible, but--
Charity: I think they are getting better honestly, and they've just always been bad.
Paul: I think back in the good old days, and by that I mean when I was
in college and not writing actually valuable software, I actually think back to how we wrote software in
college and how easy it was relative to what proper code bases are like
today.
There is a feedback loop where you'd write something, and you tested, and it's on your
machine, it's not interacting. It's not a distributed system, I guess, is basically the thing.
And that hasn't really been brought back to distributive systems. Tools
like Honeycomb are obviously doing this, CircleCI, as you know, is trying
to do a little bit of it, Dark is going deep on it.
I remember there was a blog post a couple months ago by the Instagram engineering
team, and they talked about how they were saving data that happened in production, I think
it might have been in the case of exceptions, so that you could have it on
your machine, you typed a couple of commands, and you could actually replicate it yourself.
That's the world that we need to be going to.
Errors, exceptions, things going wrong--
Charity: Real data, real services, real networks, real traffic.
Paul: Exactly.
Charity: Absolutely. Couldn't
agree more.
Paul: Real traffic is an important one because it's very easy to--
Charity: It's easy to think that tests are reality.
Paul: Right.
Charity: That was me rolling my eyes.
Paul: Well, the tests are reality if you somehow live in a world where your system is entirely consistent.
Charity: Or, all of your clients are robots.
Paul: Yeah.
Charity: That would work too.
Paul: So, this is the problem.
If you're doing a test, you've written a couple of MOX or Unitus or maybe even
integration tests, but they're not working at a scale where you might have a partition in
your thing, or there just might be incredible load, or a hard drive is going wrong as it's being
written.
You need to test under that world or else you can't really--
Charity: Exactly, and in distributed systems we just have this infinitely long tail of things that almost never happen. And once,
they do. And you
can't predict and test for all of them, just like you can't predict and monitor for all of them.
And you shouldn't try.
You should be instrumenting your system at a level of abstraction that'll empower you to ask new questions.
Paul: I think fundamentally the problem is that most people are not writing distributed systems.
They're writing websites. Or
web applications, which just happen to be distributed systems.
Charity: There's a great talk, I forget the name of the person who
wrote it, on why web programming is the original distributed system.
It is! We just aren't used to thinking of it and treating it that way. That's
why it has a bad reputation in terms of good quality.
Rachel: It does feel like there's a intellectual chasm that we have to
cross between, you know, "I'm writing this to run on my web server," vs.
"I'm writing all of these things to interact with one another on other people's clouds in
real time, and if three of them go down the other 12 will take up the slack."
Charity: Our solution so far has just been, "We're just not going to do it, and say we did."
Rachel: If you're a young engineer coming out of Trinity's CS department today, how do you
prepare yourself for this very different world from the one we grew up in?
Paul: I think the obvious one is that you want to take
the Distributed Systems elective, which I did not do, and I've regretted for decades since.
It really depends on what you're trying to do as an engineer.
Are you trying to be in the ops-y side of things, and making sure that systems stay
up? Or are you going to be more on the product engineering side? Because you can't know everything.
Charity: I would argue though, that the fundamentals of operations are no longer optional.
I think that understanding roughly what happens to your code after
you hit publish, even if you're a mobile apps engineer.
You need to understand the fundamentals of what's going to happen when things start going south.
Paul: I'm not sure I agree.
Charity: Really?
Paul: I mean, I think that optimistically everyone would know everything.
Charity: I would not say that at all.
I'm just saying that if you can't model in your head roughly how failure works, your stuff is not going to be very good.
Paul: You're one hundred percent right.
Charity: Now you could say, "Well, stuff doesn't all need to be good," and I would say, "That's also true."
Most things fail and it's usually not because your code wasn't pretty enough.
Paul: I think back to younger years when people talked about, "Oh, you
don't know what HTTP looks like, what TCP looks like," or, "You
don't know all seven levels of the OSI Layer," and that sort of thing. When
people actually talked about, "This as a level 4, and this is a level 3, and--".
Charity: But I think that failure, and I'm not talking about any particular type of failure, just the act of making code reach
humans and then sometimes not work. That
seems like a pretty fundamental thing.
Paul: The rewards for making it reach the humans are far, far higher than the cost of it occasionally going down.
You get rewarded for building the thing, and probably someone else takes the slack when it goes down.
Charity: Well, we're hoping that this is changing.
Paul: I think the incentives around buildings also mean that it may not ever change. I'm
thinking specifically, you know, when PHP came out and everyone was saying,
"Oh, these PHP developers, they don't have any idea what they're doing," yet they're building the entire internet.
They're building Wikipedia, they're building Facebook and so on.
Rachel: Facebook is an interesting example, though, because what they've done with hack is just reinterpret PHP
so that it works in a really modern distributed system as kind of a genius--
Paul: Seven years later? I mean how far were they and how successful
were they by the time that they actually started doing that?
Rachel: If you ask them.
Paul: So, they started HPHP in 2009, maybe.
And what, Facebook was four years old then? I'm not sure on my history. And they
already had a couple hundred million users. That's
certainly the scale that they should have to rewrite it.
Charity: Some of this is obviously aspirational, absolutely agree.
But I think there's value in articulating what we aspire to as an industry. Because
we can't just tell people, "Quality doesn't matter, go forth."
Because software is eating world.
Every industry is now a software industry and there are real costs to failure in industries.
Medical industries, building industries--
Rachel: The TSP migration that went south.
Charity: I mean, it's not just pretty web sites.
I feel like I hear more and more grumblings about our need to raise our
standards as an industry to be more like engineers, which is different than developers. You
can be a code monkey using code and there are more and
more and more of those, and I don't mean that in a derogatory way.
But there's also software engineering which I think should be
more rigorous and should absolutely care about the quality.
Rachel: Certainly the civil and mechanical engineers would love that because they get a bit miffed when you talk about software.
Paul: I think I have the same goal as you, which is software works
better and fails less and we get woken up in the night less.
My belief of how we get there is not that we try to affect a change in humans,
which I think people have been doing for a long time, but rather that we build better tooling.
Charity: I think I agree with you completely.
Paul: I don't think we can change how people think about the world or
the fact that there's someone today who wants to build a website who's learning
Javascript which will take off, and we'll absolutely not know anything about the system.
But if they have better tooling, if they built on Kubernetes because that's the
thing that they were told to build on, and it becomes the default then
they've got a more reliable system than if they were hacking it to themselves.
Rachel: Tooling can change behavior, though.
It can't change human nature but it can encourage certain outcomes over others by gaming the incentives.
For example, if you can't tell whether what you've built is working
or not, you will build it differently than if you can.
And that comes back to the question of responsibility and ownership.
If you have agency over what your code does in production, if you can see and
affect that, then I think you feel a lot more affinity for it and for the users.
Charity: Nobody is going to want to put energy into caring about something that they cannot affect or change.
I mean, that's that's just wasted energy.
What are vendors and service providers missing about the future of software engineering?
Paul: I think there's a habit of vendors to think about the world as their
place in it, and to think a lot about the competitive dynamics of the
marketplace and how to make themselves more important than the other people in the
space.
And I think what they're missing is that fundamentally a better
experience for users is the only thing that actually matters.
Rachel: Well, I think there's a huge distortion coming in from the finance side, particularly from
the very large school of venture capital which wants to create natural monopolies.
It's in some ways misaligned with what engineers are trying to do.
Good engineers are trying to build open platforms that enable people, and that kind of
investment is trying to create closed platforms that take advantages of inequalities in the market.
So, I get very frustrated with this mismatch between the two biggest constituencies in venture-backed software. The
entrepreneurs and, not all, but some of the investment community.
Paul: I think it's inherent, and I think it's definitely part of the venture-backed worlds.
Although, you also see a ton of bootstrapped people who are having the same mentality.
And you know, we are the center of the world and everyone else will conform to who we are.
Charity: We all read the same blog posts.
Paul: I don't actually have any solution to it, unfortunately.
I wasn't coming in with a big principle here.
Rachel: We could overthrow capitalism, maybe?
Charity: Tear it all down.
Paul: I think that's probably the closest thing to achieving this.
Rachel: All right, I'll put it on my action items.
Paul: I'll get my red flag.
Charity: Awesome.
Thanks for coming.
Rachel: Thanks so much.
Paul: Thank you.
Không có nhận xét nào:
Đăng nhận xét