Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

video statistics #46

Open
markvdb opened this issue Nov 21, 2016 · 20 comments
Open

video statistics #46

markvdb opened this issue Nov 21, 2016 · 20 comments
Assignees

Comments

@markvdb
Copy link
Member

markvdb commented Nov 21, 2016

Richard,

We have a variable number of video streaming frontends serving HLS video. We would like to get reliable visitor statistics:

  • number of watchers per room at any moment during the conference
  • total number of watchers for all rooms at any moment during the conference
  • peak number of watchers per room at any moment during the conference
  • peak number of watchers for the the entire conference

Can you make this for us please?

Thank you!

@RichiH
Copy link
Member

RichiH commented Nov 21, 2016

@markvdb Do you have this data? In absolute or relative values?

@markvdb
Copy link
Member Author

markvdb commented Nov 21, 2016

No, that's why we are asking you. How would you collect that data? Easy for
nginx-rtmp, but not trivial for HLS streaming...

2016-11-21 22:36 GMT+01:00 Richard Hartmann notifications@github.com:

@markvdb https://github.com/markvdb Do you have this data? In absolute
or relative values?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#46 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAviCL8FSKCFBEkZDN8aIUYaVkOEjbp0ks5rAg7zgaJpZM4K4pYb
.

Mark Van den Borre
Hogestraat 16
3000 Leuven, België
+32 486 961726

@krokodilerian
Copy link
Contributor

I can provide example nginx logs, basically you see two things: requests for .m3u8 file (the playlist) and STREAMNAME-[0-9]*.ts , which is the current segment the user is viewing. If you parse the file and see which segments were most watched, or which IP watched 10 consecutive segments or whatever, you can probably calculate all the stats.

@RichiH
Copy link
Member

RichiH commented Nov 21, 2016

Without a bit more context, I am not even sure what exactly you're asking us to do.

What does the stack look like? What other tools exist which do similar? Which bits and pieces of the nginx logs are the most relevant and interesting ones in your experience? Is there a tapout for basic stats which we could use? Do you already collect partial info which we could re-use or glean implementation details from?

Something along those lines.

@krokodilerian
Copy link
Contributor

Okay, I'll try to explain a bit better :)
(@RichiH, I've mailed you an example log file)

The only way I know to measure anything related to HLS is by parsing the access logs.

HLS itself (HTTP live streaming) is moderately stupid standard for video streaming, that does the following:

  • It splits the video in 5-second chunks, named STREAMNAME-NUMBER.ts (number is consecutive, from 1 to infinity);
  • On each new chunk it regenerates a "playlist" file (named STREAMNAME.m3u8) with the last 5 (or more) chunks;
  • It deletes (optionally) the older chunks.

The clients re-fetch continuously STREAMNAME.m3u8 and the mentioned .ts files in it.

For example, to find out how many hours of video were watched for a single room, you count the number of .ts files, and treat them as 5 seconds each.

If you want to find out how many unique people watched a stream, you filter out all the .ts files for one, get a list of unique IP addresses, and for each of those check if it fetched more than 12 .ts files (1 minute). This isn't perfect, but should be close to reality.

If you want to see when a room peaked, you get all the .ts files for a room and see which one was downloaded most.

Does this answer the question, or am I explaining something completely different and useless?:)

@RichiH
Copy link
Member

RichiH commented Nov 21, 2016

If that's all you want, that seems almost trivial. As a lot of that is domain-specific to video, I would prefer to teach you how to fish so you can add new stuff yourself. If you really just need someone to write a log file parser, we can help there as well.

Let's assume we have two rooms, A and B and two streams, 720p and 1080p. After parsing the current logs, you would generate a text file like

hls_watchers_current{room="A",quality="720p"} 10
hls_watchers_current{room="A",quality="10180p"} 15
hls_watchers_current{room="B",quality="720p"} 19
hls_watchers_current{room="B",quality="1080p"} 35

and be done with it. The rest can be handed off to Prometheus and it will magically do the right thing for you.

@RichiH
Copy link
Member

RichiH commented Nov 23, 2016

Is there any update on this? I still think it would be better in the long-term if you did it and we help you understand why & how.

@krokodilerian
Copy link
Contributor

Sorry, have been very busy. I can do the parser that deals with it and just generates the file, shouldn't be a problem.

@RichiH
Copy link
Member

RichiH commented Nov 23, 2016 via email

@SuperQ
Copy link
Contributor

SuperQ commented Nov 23, 2016

For log file parsing, I highly recommend https://github.com/google/mtail or https://github.com/fstab/grok_exporter.

If I can have some sample nginx logs I can write this easily.

@SuperQ
Copy link
Contributor

SuperQ commented Nov 26, 2016

@krokodilerian Please post some log samples, or point me to the code that generates the logs.

@krokodilerian
Copy link
Contributor

@SuperQ , I have a dump of the logs, how do I give them to you? ~607MB.

@SuperQ
Copy link
Contributor

SuperQ commented Feb 7, 2017

Ahh, the logs were supposed to be used for live monitoring. Having them later isn't necessary. We can possibly do this for 2018.

All I really needed was a representative sample, or the log format code.

@krokodilerian
Copy link
Contributor

I'll send a 10MB excerpt from a log somewhere, what about email?

@SuperQ
Copy link
Contributor

SuperQ commented Feb 7, 2017

I grabbed a copy of the logs from the stream frontend host.

@markvdb
Copy link
Member Author

markvdb commented Dec 1, 2018

Hello Ben,

Any update on this? If not, we'll go by the total bandwidth consumption of the streamers...

Thank you!

Mark

@SuperQ
Copy link
Contributor

SuperQ commented Dec 2, 2018

I think we implemented this in mtail. See the nginx.mtail file.

I don't know where I put the log file from the streaming front-end to validate the functionality. If anyone has some backup logs, that'd be helpful.

@SuperQ
Copy link
Contributor

SuperQ commented Dec 2, 2018

I added #129 to also give us better internal metrics from nginx.

@markvdb markvdb assigned bastischubert and unassigned RichiH and SuperQ Oct 20, 2023
@markvdb markvdb added 2024 and removed 2019 labels Oct 20, 2023
@bastischubert
Copy link
Member

We can use promtail for that, as we have a loki instance up and running.

first we need nginx to produce a json access log that we can parse properly

log_format json_analytics escape=json '{'
                    '"msec": "$msec", ' # request unixtime in seconds with a milliseconds resolution
                    '"connection": "$connection", ' # connection serial number
                    '"connection_requests": "$connection_requests", ' # number of requests made in connection
                    '"pid": "$pid", ' # process pid
                    '"request_id": "$request_id", ' # the unique request id
                    '"request_length": "$request_length", ' # request length (including headers and body)
                    '"remote_addr": "$remote_addr", ' # client IP
                    '"remote_user": "$remote_user", ' # client HTTP username
                    '"remote_port": "$remote_port", ' # client port
                    '"time_local": "$time_local", '
                    '"time_iso8601": "$time_iso8601", ' # local time in the ISO 8601 standard format
                    '"request": "$request", ' # full path no arguments if the request
                    '"request_uri": "$request_uri", ' # full path and arguments if the request
                    '"args": "$args", ' # args
                    '"status": "$status", ' # response status code
                    '"body_bytes_sent": "$body_bytes_sent", ' # the number of body bytes exclude headers sent to a client
                    '"bytes_sent": "$bytes_sent", ' # the number of bytes sent to a client
                    '"http_referer": "$http_referer", ' # HTTP referer
                    '"http_user_agent": "$http_user_agent", ' # user agent
                    '"http_x_forwarded_for": "$http_x_forwarded_for", ' # http_x_forwarded_for
                    '"http_host": "$http_host", ' # the request Host: header
                    '"server_name": "$server_name", ' # the name of the vhost serving the request
                    '"request_time": "$request_time", ' # request processing time in seconds with msec resolution
                    '"upstream": "$upstream_addr", ' # upstream backend server for proxied requests
                    '"upstream_connect_time": "$upstream_connect_time", ' # upstream handshake time incl. TLS
                    '"upstream_header_time": "$upstream_header_time", ' # time spent receiving upstream headers
                    '"upstream_response_time": "$upstream_response_time", ' # time spend receiving upstream body
                    '"upstream_response_length": "$upstream_response_length", ' # upstream response length
                    '"upstream_cache_status": "$upstream_cache_status", ' # cache HIT/MISS where applicable
                    '"ssl_protocol": "$ssl_protocol", ' # TLS protocol
                    '"ssl_cipher": "$ssl_cipher", ' # TLS cipher
                    '"scheme": "$scheme", ' # http or https
                    '"request_method": "$request_method", ' # request method
                    '"server_protocol": "$server_protocol", ' # request protocol, like HTTP/1.1 or HTTP/2.0
                    '"pipe": "$pipe", ' # "p" if request was pipelined, "." otherwise
                    '"gzip_ratio": "$gzip_ratio", '
                    '}';

...
access_log /var/log/nginx/access.log json_analytics;

feel free to remove json fields from the output that are

krokodilerian added a commit to FOSDEM/infrastructure that referenced this issue Oct 23, 2023
@krokodilerian
Copy link
Contributor

Here you go: FOSDEM/infrastructure@fa7088e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants