Programming, Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (3)

Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (3)

because we need to create or try to gather some context

and what we refer with context, because your application is

running in a pod.

That pod has a name.

That pod has an ID, an identification.

Also, you're running in a namespace.

If you deploy a pod, that pod belongs to a namespace.

It's running in a node, in a host.

Maybe you added some labels and annotation to the pod,

because maybe an annotation say, no--

this-- for example--

sorry.

Some labels saying, OK, [INAUDIBLE]

running or belongs to production, quality, testing,

or anything like that.

So how do you create all this context

for an application which is running in Kubernetes?

This whole context also [INAUDIBLE] more complexity.

And it's because this.

Because the container name and the container ID,

you can find it in the file system.

And I'm talking here about the very low level

how things operate.

You can get that.

But all of the other components come from the Kubernetes API

server or the master server.

So you have some information that is locally

in the node and other information that's

in a separate machine.

And you need that your logging agent

be able to get all of this information together.

And that's why the logging agent that you use and that

Stackdriver use needs to be able to handle all of this for you,

because of course, you cannot do this manually.

But your logging agent needs to be able to create a file

system, understand the messages, parse the messages--

which are in different formats--

and correlate all the information together

with context coming from the API server.

And that's Fluentd.

Actually, Stackdriver uses Fluentd.

You will find it as Google Fluentd,

because it's a Fluentd package that's

ready to use with Google services, which

has the whole extensions and integration

with the Stackdriver right out of the box.

And once you get the things of Fluent as a log processor,

you can send things to Stackdriver or maybe

your own database.

Maybe you're running on prem, or you have your own system

to store logs.

So that's why the logging agents--

it's not easy to be a logging agent as a tool,

and also needs to be able and communicate

with different components over the network.

Now log processing in Kubernetes is like the next step.

If we think that we have just a single node,

well, we'll have the master and a single node.

Fluentd, as a logging agent, needs to run as a DaemonSet.

Do you know what a DaemonSet is?

Are you familiar with pods?

We just spoke about.

OK.

A DaemonSet-- it's like a pod, but this pod

runs on every node of the cluster.

So if you append more nodes to your Kubernetes cluster,

like a [INAUDIBLE] machine, and you already deploy it,

a DaemonSet, you're going to get a new pod with that DaemonSet

running on that new node.

So the way to work is that you deploy Fluentd as a DaemonSet.

Of course, [INAUDIBLE] is already

working by default. What it does, it's running,

and it starts reading the whole log files that we explained

a few slides ago, and all the content and JSON

files that were created.

Once it read the whole content from the application,

it makes sure to gather the metadata that

agitate every message, with every container,

with every port, with labels, annotations, and so on.

So the simple message that we started at the beginning

becomes something like this with context.

Because it has Kubernetes, it has a host, pod name, pod ID,

container, name, namespace, and so on,

because then in a Stackdriver or any database that you're using,

you don't want to query a show me the message that start

with "Hey Next."

You want to say, show me all the messages that,

for example, that the container name, so

Kubernetes, that container name, equal "next."

So this metadata allows you to do a better data analysis

and help the storage solution to create the data that you really

care about it and not query all of them.

MARY KOES: Thanks.

So I mentioned earlier that we have the same team that

supports both our Cloud Logging solution and Google's

internal logging solution.

So we've learned a lot of lessons

talking to customers like yourselves and customers

at Google--

who use the logging tools that we build.

So I wanted to share some lessons learned.

One is the value of structured logs,

so not all logs are created equal.

They all have great potential, but if we format them

in a specific way, we're able to answer questions more easily.

So an unstructured log might have

a text payload of blah, blah, blah, purchased four widgets.

A structured log will have a lot of key value

pairs in JSON format.

This makes it much easier, then, to answer questions like,

how many unique customers purchased a widget?

How many widgets were purchased overall?

If you use structured logs, you can then send--

when you, for example, send stuff to BigQuery,

that structure will persist and make

it much easier to interact with your logs downstream as well.

Plus, you can do things in Stackdriver logging like say,

show me the customer ID hash.

I want that to be displayed easily,

so I can understand what's going on more quickly.

The other tip that we have learned over the years

is that it can be really tricky to get

the right level of logging.

If you do too much, it's crazy expensive,

and you can't find anything in your logs.

If you do too little, then you don't have the data

that you need when you need it.

And so trying to get that just right level of logs

can be a real challenge both internal to Google

and for our customers.

There's no magic bullet here, but one thing

we found to be really helpful is to enforce

a consistent definition for logging levels

through your code.

So an error should be something that really stopped your code,

and you expect somebody to actually investigate here,

whereas maybe debug and info are less important.

And that makes it easier, when I filter,

to quickly find the information that we care about.

And the final kind of insight we've derived over time

is that it's really helpful to have data about the log volume

so that you can iterate and refine this over time.

So specifically, in Stackdriver Logging,

we have metrics at a very fine-grained level,

so you can see not only which resources are sending logs,

but which logs or which--

are you getting mostly errors, or are these mostly info

statements?

And you can slice and dice your data

in all sorts of ways, which can allow

you to iterate on your logging solution over time.

EDUARDO SILVA: So getting back to logging agents,

there are many solutions and open source products available.

But when talking specifically about Fluentd,

which is the default agent in Stackdriver and also

for [INAUDIBLE] companies and products,

we can think that Fluentd is more than just

a simple open source project.

I would like to say that it is a whole ecosystem.

Since the moment there are many companies contributing back

to the same project means that it's

more relevant for the market, and also, it's

becoming a huge solution.

And at some point, people said that Fluentd for some cases

was too heavy, or we have some performance issues.

And sometimes the question is, how many records are you

ingesting per second?

Sometimes they say, a few thousands per second.

Of course, you're going to have some performance penalties.

You need memory, CPU, and you are filtering data,

and you have like 10 filters, that is time-consuming.

And they said, OK, but we need something lightweight.

Fluentd original is written in Ruby and C.

It's quite performant, but at high scales,

some companies and user customers

said we need something that can be lightweight and more

tied to cloud native solutions.

And that's why we created a second project, which

is called Fluentbit, created same by Treasure Data, now

driven by the community.

It's also with the Cloud Native Computing Foundation,

and Fluentbit tried to solve the problems in the cloud native

space in a different way.

You've seen the same background or experience from Fluentd.

This is not like a replacement for Fluentd,

but most of cases, people said, I want to use Fluentd.

And for this special namespace class,

I'm going to run Fluentbit because of performance

reason or special features.

The good thing about Fluentbit is that it's

written in Perl and C language.

It's optimized for performance, low memory footprint.

And it also has a pluggable architecture.

It's pretty much a Fluentd, but Fluentd ecosystem

is quite huge.

We have 700 plug-ins.

Here, we have like 45.

So you can see that it's more tied for specific things,

and it's fully integrated with Kubernetes and Prometheus,

and it's a sub-project of the CNCF.

And like a kind of announcement, we

are releasing in a couple of days the new version, which

is called 0.14.

I think that sometimes numbers does not mean something.

The [INAUDIBLE] been around for more than three years,

but we're trying to have the 1.0 at the end of the year.

And 0.14 will have integration with Google Stackdriver.

It will not, at the beginning, this integration

is quite experimental.

It means that it doesn't have the whole features

that Google Fluentd has, but we're working on that.

Also, we will have the load balancing capabilities.

In logging, there is one specific feature

that if you are dispatching your logs to your service,

and you have some [INAUDIBLE] audit,

your logging agent needs to be able to recover

from that situation meaning doing buffering

or maybe trying, if it fell from point A,

try to do some balancing, and try

point B or a different node.

So that balancing capability is coming up right now,

and also, we're happy to say that we

are implementing a new way to do filtering in a logging agent.

If you wanted to do some filtering of Fluentd,

you had to write your own Ruby script for filtering.

But if you wanted to do it on Fluentbit, like few weeks

ago, you had to write your C filter.

And for some people said, I cannot write C.

It's too complex, and I don't have the time, which is right.

We went in Europe at the beginning of this year in QCon,

and some people say, hey, GDPR is coming.

And I was not aware about GDPR.

OK, what is GDPR?

Explain it to me more about it.

And they said, OK, we need to obfuscate data.

We need to add some conditionals for the configuration

for Fluentbit and fix like A, B, C, D. OK, that is too complex.

We cannot maintain multiple plugins for different needs.

Yeah, because they said our logs are credit card transactions,

and they have the whole numbers, and we

need to obfuscate the data.

So we need a solution.

That is important.

So we said, OK, a new way to do filtering will be,

let's try to do some scripting.

You run Fluentbit as an agent, but also, when it starts,

you can start your own scripts, reading in Lua language, which

are quite performant.

And you can do all the data modifications

that you want without low performance penalty.

And also, we have an address here.

If you want to get more about all the news on Fluentbit 0.14

to be released, there's a link on the slides that you--

a form where you can sign up to get the news.

And we're taking the Lua script--

this is like a simple plugin.

Well, this is like a text file.

This is Lua, where basically, you

create a new record using Lua tables,

Try LingQ and learn from Netflix shows, Youtube videos, news articles and more.

Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (3) Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (3) Solução de registo centralizado para Google Cloud Platform (Cloud Next '18... (3) Централизованное решение для ведения журналов для Google Cloud Platform (Cloud Next '18... (3) Google Cloud Platform için Merkezi Günlükleme Çözümü (Cloud Next '18... (3)