Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (1)
[MUSIC PLAYING]
MARY KOES: Thank you all for coming
on one of the later speeches of the conference today--
super excited to have you all here.
I'm Mary Koes.
I am a product manager with Stackdriver Logging.
EDUARDO SILVA: Hi.
My name Eduardo.
I'm a software engineer at Treasure Data.
MARY KOES: And we're here to talk to you today
about getting started running a centralized logging
solution on Google Cloud.
And I have to say nothing but my deep love of logs
would convince me to stand up on stage here.
So I'm really excited to get to share with you
why centralized logs can make your life easier,
how to do that in Google Cloud.
Eduardo will talk a little bit about logging in Kubernetes
specifically, and then some insights from managing
logs of scale that we've had--
we've learned, sometimes the hard way, through Google.
So let me see just a show of hands.
Has anyone-- do we have any current Google Cloud
customers in the audience who use Stackdriver Logging?
Awesome.
And another show of hands-- do we have any software engineers
from Google here who use logging in their day-to-day life
at Google?
Yeah, they don't really let us into these.
Oh, we have one.
Yay.
A couple.
So they don't usually let us into these talks,
so that's awesome.
The team that I work with in Pittsburgh
does both the logs to help all of our Cloud customers
manage their operations at scale,
but also the logs that our Google engineers use
for understanding the applications we develop
internal to Google as well.
So let's talk about why logs matter.
Fundamentally, troubleshooting-- anything,
but especially distributed microservices is really hard.
So we have, in the cloud, a bunch of interconnected pieces,
like this puzzle.
And when something goes wrong, it's
really hard to figure out how we need to put them back together.
Logs will give us visibility into our system.
So in the puzzle analogy, they're
the picture on our puzzle pieces here.
And while you could assemble things
one puzzle piece at a time, a centralized log view
will allow you to see all of the picture
at the same time, which will help
you to improve your reliability, performance, and security.
So let's talk about how we actually
deliver on a centralized logging solution running
in Google Cloud.
And I mentioned I am the product manager for Stackdriver
Logging.
That is one of three pillars that we
have within Stackdriver, so we've
got logging, monitoring, and application performance
management, or APM.
And each of these are part of a SaaS ops suite
to help all of you monitor and troubleshoot your applications,
whether they're running on GCP, AWS, or cloud
native infrastructure, and trying to keep
your apps fast and available.
So we have a bunch of different components here.
We have log producers on the one side,
and there are many different ways logs come in
that you care about.
So we have audit logs, which are super special.
We've got platform logs that are sent from individual Google
services like Cloud Load Balancer.
We have open source components that
are creating logs that you might be running
to power your infrastructure in your apps,
and then there are logs that come directly from the apps
themselves.
On the other hand, we've got consumers of logs.
So there are a number of different options
for consuming logs.
One, of course, is Stackdriver Logging.
Some users will also want to consume logs
in a tool like Google BigQuery or save them
for compliance in Google Cloud Storage
or use one of the many different partner applications
that we support, including Sumo Logic, Splunk, and Elk Stack,
just to name a few of the more common ones.
So the log router is what actually glues these two things
together-- the producers and the consumers.
So if I look over here, I've got,
at the top, the log producers.
Everything gets centralized through the logging API.
And then we've got the log router,
which will help us to get the logs to the right endpoint.
And down at the bottom, we have the Stackdriver product
on the left-hand side, and then on the right-hand side,
we have other potential destinations,
whether it's cloud storage, BigQuery, or anywhere else,
which can go through Pub/Sub.
In order to get logs into Stackdriver,
there is zero setup for GCP audit
logs and logs from GCP services, so any
logs that are being created on your behalf by a GCP service.
And then if you want to instrument your code,
you can also use client libraries
to send logs to the API directly.
But the vast majority of our logs
that aren't sent from cloud services
actually go through our open source logging agent, Fluentd.
EDUARDO SILVA: Hello.
Maybe most of you already know what is a logging agent,
but if we go like years ago, we will say that,
and we can think about logging.
The first thing that we can think about is maybe syslog.
Have you used syslog before?
Please raise your hand.
So everybody who raised their hand is over 30, [LAUGHS]
actually.
Yeah.
So logging mechanism exists for a couple of reasons,
maybe because you went to troubleshoot,
you want to monitor something, or understand
what's going on in your applications in your system.
So what we're talking about here, logging is not new.
I think that logging exists from the beginning,
since if you're doing development, you're debugging.
You have some kind of logging mechanism.
If you are deploying your application,
you have some logging mechanism for that.
It could be through syslog, rsyslog, or just writing
to a plain log file in the file system.
But we have current problems now.
In the past, we used to have, like, common messages.
The application is doing A, B, C, D, or it's crashing.
We have some exceptions.
But what happens when you have multiple applications that
generate data in a different way?
For example, if you know Apache web server,
Apache web server creates their own log entry,
saying this is an IP address, this
is a practical version size of the document.
If you look at, for example, MySQL logs, they
look pretty much different.
What kind of query was running?
What's our reception?
How are the tables?
Do we have some charting or things like that?
So if you want to do logging, it's
not because logging is cool.
I think that, honestly, logging is boring from a--
MARY KOES: It's cool.
EDUARDO SILVA: --from a play perspective,
but you have to deal with that, because you have problems
that you need to solve.
So the thing is if you have multiple source of information
that come from different places in different formats,
the ultimate goal that you want to do is data analysis.
So in order to do data analysis, you need a logging agent,
and this logging agent must have different strategies
to deal with data formats, data sources.
One thing is to read log files from the file system.
The other is to listen from TCP or UDP.
If you're using, for example, hardware devices
like firewalls, and they are sending their logs over UDP,
you need to support that too.
So we are moving forward.
We can think about many logging agents,
but the experience here is that Treasure Data, the company
that I work for, we created Fluentd around seven years ago
to solve the whole data collection problem.
And Fluentd is an open source project.
It's fully open source, and right now,
the CNCF, the Cloud Native Computing Foundation,
the goal of this project is it allows you to consume logs
from multiple places with different formats
and ingest back the information on any kind of central database
like Google Stackdriver or your own elastic search
database or MySQL.
In general, as a project and ecosystem,
we have more than 700 plugins.
And I think that from a company perspective,
we maintain like 10 or 20.
And all of the others are made by the community.
Do you know Fluentd?
Is anybody using Fluentd before here?
Good.
We have some users.
MARY KOES: Awesome.
Thanks.
So once we get logs into the centralized logging
API, however that happens, and many times
through our logging agent, our next goal
is to get it to the right destination.
And this is where the log router comes into place.
So we have two kind of different flows for managing
logs through the log router.
One is called log sinks, and that is basically an inclusion
filter, so you're going to say exactly what it is you want
sent to these destinations.
And if you don't say anything, nothing
will go to these destinations.
We can send the same log to BigQuery and Pub/Sub
or to none of the above.
On the other side, we have logs being
sent to Stackdriver, the logging product
and we manage this via exclusion filter.
So if you do nothing, all your logs
will go to Stackdriver logging.
And then if there are certain logs that you don't want to go
there-- either because you don't want anyone in your company
to have access to them or if you don't want to pay for them--
then you can use an exclusion filter
to control what goes into Stackdriver logging.
And that's a pretty big difference
compared to many of the other cloud
platform logging experiences out there.
So I'll call out two important ideas around our log router.
One is that logs can be exported to log sinks,
even if they are excluded from Stackdriver log storage.
So this is something that can be a little bit
tricky to get from our documentation-- we'll work on
that-- but hopefully, clear now.
And then the other thing that's a really powerful tool
is a tool called aggregated exports.
So most of our customers have an organization
with more than one project, and they
want to manage logs across their entire organization.
A special use case here is security, where oftentimes,
the security team wants to see the audit logs
across their entire organization.
And so an aggregated export can be set up either at the folder
level or at the org level.
And it will inherit all of the stuff below it,
so if audit logs are created for a new project that
didn't exist now in the organization,
we will go ahead and include those in the export as well.
And then the last thing I'll talk
about in terms of logging tools is how you
analyze logs with Stackdriver.
And we'll get to a demo here in just a moment,
but Stackdriver Logging supports both basic and advanced
filters.
We also have a tool called Error Reporting
as part of the logging library toolkit
here that will automatically go through your logs and group
like logs together, specifically looking for stack traces
and issues in your code.
And then we also provide the ability
to create alerts and dashboards from logs
by using a tool called Logs-Based Metrics,
and that pairs with Stackdriver Monitoring.
So that's kind of a theme that you'll
hear across the Stackdriver tools
is that while you can use logging by itself
or monitoring by itself or APM tools
by themselves, if you use multiple ones together,
you can actually get something that is more powerful than any
of them individually.
And now my favorite part is demo time.
If we could cut over here, perfect.
All right, so I have a web store running on GKE.
And I can see the logs from my web store.
At first glance, I can see, hey, there are quite a bit of logs,
looks like a lot of errors.
But I'm not really sure where to get
started if there's an issue.