Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (3)
because we need to create or try to gather some context
and what we refer with context, because your application is
running in a pod.
That pod has a name.
That pod has an ID, an identification.
Also, you're running in a namespace.
If you deploy a pod, that pod belongs to a namespace.
It's running in a node, in a host.
Maybe you added some labels and annotation to the pod,
because maybe an annotation say, no--
this-- for example--
sorry.
Some labels saying, OK, [INAUDIBLE]
running or belongs to production, quality, testing,
or anything like that.
So how do you create all this context
for an application which is running in Kubernetes?
This whole context also [INAUDIBLE] more complexity.
And it's because this.
Because the container name and the container ID,
you can find it in the file system.
And I'm talking here about the very low level
how things operate.
You can get that.
But all of the other components come from the Kubernetes API
server or the master server.
So you have some information that is locally
in the node and other information that's
in a separate machine.
And you need that your logging agent
be able to get all of this information together.
And that's why the logging agent that you use and that
Stackdriver use needs to be able to handle all of this for you,
because of course, you cannot do this manually.
But your logging agent needs to be able to create a file
system, understand the messages, parse the messages--
which are in different formats--
and correlate all the information together
with context coming from the API server.
And that's Fluentd.
Actually, Stackdriver uses Fluentd.
You will find it as Google Fluentd,
because it's a Fluentd package that's
ready to use with Google services, which
has the whole extensions and integration
with the Stackdriver right out of the box.
And once you get the things of Fluent as a log processor,
you can send things to Stackdriver or maybe
your own database.
Maybe you're running on prem, or you have your own system
to store logs.
So that's why the logging agents--
it's not easy to be a logging agent as a tool,
and also needs to be able and communicate
with different components over the network.
Now log processing in Kubernetes is like the next step.
If we think that we have just a single node,
well, we'll have the master and a single node.
Fluentd, as a logging agent, needs to run as a DaemonSet.
Do you know what a DaemonSet is?
Are you familiar with pods?
We just spoke about.
OK.
A DaemonSet-- it's like a pod, but this pod
runs on every node of the cluster.
So if you append more nodes to your Kubernetes cluster,
like a [INAUDIBLE] machine, and you already deploy it,
a DaemonSet, you're going to get a new pod with that DaemonSet
running on that new node.
So the way to work is that you deploy Fluentd as a DaemonSet.
Of course, [INAUDIBLE] is already
working by default. What it does, it's running,
and it starts reading the whole log files that we explained
a few slides ago, and all the content and JSON
files that were created.
Once it read the whole content from the application,
it makes sure to gather the metadata that
agitate every message, with every container,
with every port, with labels, annotations, and so on.
So the simple message that we started at the beginning
becomes something like this with context.
Because it has Kubernetes, it has a host, pod name, pod ID,
container, name, namespace, and so on,
because then in a Stackdriver or any database that you're using,
you don't want to query a show me the message that start
with "Hey Next."
You want to say, show me all the messages that,
for example, that the container name, so
Kubernetes, that container name, equal "next."
So this metadata allows you to do a better data analysis
and help the storage solution to create the data that you really
care about it and not query all of them.
MARY KOES: Thanks.
So I mentioned earlier that we have the same team that
supports both our Cloud Logging solution and Google's
internal logging solution.
So we've learned a lot of lessons
talking to customers like yourselves and customers
at Google--
who use the logging tools that we build.
So I wanted to share some lessons learned.
One is the value of structured logs,
so not all logs are created equal.
They all have great potential, but if we format them
in a specific way, we're able to answer questions more easily.
So an unstructured log might have
a text payload of blah, blah, blah, purchased four widgets.
A structured log will have a lot of key value
pairs in JSON format.
This makes it much easier, then, to answer questions like,
how many unique customers purchased a widget?
How many widgets were purchased overall?
If you use structured logs, you can then send--
when you, for example, send stuff to BigQuery,
that structure will persist and make
it much easier to interact with your logs downstream as well.
Plus, you can do things in Stackdriver logging like say,
show me the customer ID hash.
I want that to be displayed easily,
so I can understand what's going on more quickly.
The other tip that we have learned over the years
is that it can be really tricky to get
the right level of logging.
If you do too much, it's crazy expensive,
and you can't find anything in your logs.
If you do too little, then you don't have the data
that you need when you need it.
And so trying to get that just right level of logs
can be a real challenge both internal to Google
and for our customers.
There's no magic bullet here, but one thing
we found to be really helpful is to enforce
a consistent definition for logging levels
through your code.
So an error should be something that really stopped your code,
and you expect somebody to actually investigate here,
whereas maybe debug and info are less important.
And that makes it easier, when I filter,
to quickly find the information that we care about.
And the final kind of insight we've derived over time
is that it's really helpful to have data about the log volume
so that you can iterate and refine this over time.
So specifically, in Stackdriver Logging,
we have metrics at a very fine-grained level,
so you can see not only which resources are sending logs,
but which logs or which--
are you getting mostly errors, or are these mostly info
statements?
And you can slice and dice your data
in all sorts of ways, which can allow
you to iterate on your logging solution over time.
EDUARDO SILVA: So getting back to logging agents,
there are many solutions and open source products available.
But when talking specifically about Fluentd,
which is the default agent in Stackdriver and also
for [INAUDIBLE] companies and products,
we can think that Fluentd is more than just
a simple open source project.
I would like to say that it is a whole ecosystem.
Since the moment there are many companies contributing back
to the same project means that it's
more relevant for the market, and also, it's
becoming a huge solution.
And at some point, people said that Fluentd for some cases
was too heavy, or we have some performance issues.
And sometimes the question is, how many records are you
ingesting per second?
Sometimes they say, a few thousands per second.
Of course, you're going to have some performance penalties.
You need memory, CPU, and you are filtering data,
and you have like 10 filters, that is time-consuming.
And they said, OK, but we need something lightweight.
Fluentd original is written in Ruby and C.
It's quite performant, but at high scales,
some companies and user customers
said we need something that can be lightweight and more
tied to cloud native solutions.
And that's why we created a second project, which
is called Fluentbit, created same by Treasure Data, now
driven by the community.
It's also with the Cloud Native Computing Foundation,
and Fluentbit tried to solve the problems in the cloud native
space in a different way.
You've seen the same background or experience from Fluentd.
This is not like a replacement for Fluentd,
but most of cases, people said, I want to use Fluentd.
And for this special namespace class,
I'm going to run Fluentbit because of performance
reason or special features.
The good thing about Fluentbit is that it's
written in Perl and C language.
It's optimized for performance, low memory footprint.
And it also has a pluggable architecture.
It's pretty much a Fluentd, but Fluentd ecosystem
is quite huge.
We have 700 plug-ins.
Here, we have like 45.
So you can see that it's more tied for specific things,
and it's fully integrated with Kubernetes and Prometheus,
and it's a sub-project of the CNCF.
And like a kind of announcement, we
are releasing in a couple of days the new version, which
is called 0.14.
I think that sometimes numbers does not mean something.
The [INAUDIBLE] been around for more than three years,
but we're trying to have the 1.0 at the end of the year.
And 0.14 will have integration with Google Stackdriver.
It will not, at the beginning, this integration
is quite experimental.
It means that it doesn't have the whole features
that Google Fluentd has, but we're working on that.
Also, we will have the load balancing capabilities.
In logging, there is one specific feature
that if you are dispatching your logs to your service,
and you have some [INAUDIBLE] audit,
your logging agent needs to be able to recover
from that situation meaning doing buffering
or maybe trying, if it fell from point A,
try to do some balancing, and try
point B or a different node.
So that balancing capability is coming up right now,
and also, we're happy to say that we
are implementing a new way to do filtering in a logging agent.
If you wanted to do some filtering of Fluentd,
you had to write your own Ruby script for filtering.
But if you wanted to do it on Fluentbit, like few weeks
ago, you had to write your C filter.
And for some people said, I cannot write C.
It's too complex, and I don't have the time, which is right.
We went in Europe at the beginning of this year in QCon,
and some people say, hey, GDPR is coming.
And I was not aware about GDPR.
OK, what is GDPR?
Explain it to me more about it.
And they said, OK, we need to obfuscate data.
We need to add some conditionals for the configuration
for Fluentbit and fix like A, B, C, D. OK, that is too complex.
We cannot maintain multiple plugins for different needs.
Yeah, because they said our logs are credit card transactions,
and they have the whole numbers, and we
need to obfuscate the data.
So we need a solution.
That is important.
So we said, OK, a new way to do filtering will be,
let's try to do some scripting.
You run Fluentbit as an agent, but also, when it starts,
you can start your own scripts, reading in Lua language, which
are quite performant.
And you can do all the data modifications
that you want without low performance penalty.
And also, we have an address here.
If you want to get more about all the news on Fluentbit 0.14
to be released, there's a link on the slides that you--
a form where you can sign up to get the news.
And we're taking the Lua script--
this is like a simple plugin.
Well, this is like a text file.
This is Lua, where basically, you
create a new record using Lua tables,