| Dockerfile | ||
| entrypoint.sh | ||
| parse.py | ||
| README.md | ||
Datagrepper Account Parser
This process works to extract from each message on the Fedora Infrastructure message bus a username mapped to that event. Some events are not supported at this time (for example Meetbot does not generate 1 record per person in the meeting, while it probably should).
| Topic Pattern | JSON |
|---|---|
| org.fedoraproject.prod.badges.badge.award% | $.user.username |
| org.fedoraproject.prod.fedbadges% | $.user.username |
| org.fedoraproject.prod.discourse.like% | $.webhook_body.like.post.username |
| org.fedoraproject.prod.discourse.post% | $.webhook_body.post.username |
| org.fedoraproject.prod.discourse.solved% | $.webhook_body.solved.username |
| org.fedoraproject.prod.discourse.topic% | $.webhook_body.topic.created_by.username |
| org.fedoraproject.prod.mailman% | $.msg.from |
| org.fedoraproject.prod.planet% | $.username |
| org.fedoraproject.prod.git% | $.commit.username |
| org.fedoraproject.prod.fas% | $.msg.user |
| org.fedoraproject.prod.openqa% | $.user |
| org.fedoraproject.prod.bodhi.buildroot% | $.override.submitter.name |
| org.fedoraproject.prod.bodhi.update.comment% | $.comment.user.name |
| org.fedoraproject.prod.bodhi% | $.update.user.name |
| org.fedoraproject.prod.bugzilla% | $.event.user.login |
| org.fedoraproject.prod.waiver% | $.username |
| org.fedoraproject.prod.fmn% | $.user.name |
| org.fedoraproject.prod.buildsys% | $.owner |
| org.fedoraproject.prod.copr% | $.user |
| io.pagure.prod.pagure% | $.agent |
| org.fedoraproject.prod.pagure.commit.flag% | $.flag.user.name |
| org.centos.sig.integration.gitlab.redhat.centos-stream% | $.user.name |
| org.fedoraproject.prod.wiki% | $.user |
| org.release-monitoring.prod.anitya.% | $.message.agent |
| org.fedoraproject.prod.maubot.cookie.give.% | $.sender |
| org.fedoraproject.prod.kerneltest.upload.new% | $.agent |
| org.fedoraproject.prod.fedocal% | $.agent |
| org.centos.prod.buildsys% | $.owner |
| org.fedoraproject.prod.badges.person.rank.advance% | $.person.nickname |
Only messages with non-null headers and body are processed. The extracted usernames are cleaned up to remove any extra characters, quotes etc. Any rows without a valid username are discarded.
The result is Parquet files containing only the essential fields:
sent_attimestampidof the messagetopicof the eventusernameas the parsed username for the message
Output files are saved in the output_users directory you map for the container as fedora-{YYYYMMDD}_processed.parquet filename.
Usage
Build the container
docker build -t datagrepper-parse-accounts .
Run the container
docker run --rm \
-e INPUT_DIR=/data/input \
-e OUTPUT_DIR=/data/output_users \
-v ~/data/fedora/datagrepper-raw:/data/input:ro \
-v ~/data/fedora/datagrepper-users:/data/output_users \
datagrepper-parse-accounts
Processed Parquet files will be saved to:
~/data/fedora/datagrepper-users
License
This project is licensed under the GNU General Public License v3.0.
Copyright © 2025 Robert Wright
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see https://www.gnu.org/licenses/.