>>6986>are you the scala guy?Yep. Thanks for the feedback and encouragement, comrade. It is appreciated.
>you will have to be doing alot of joins + group by functions and possibly some window (over/partition by) as well, which will be pretty slow.That's true. SQL is really fast, but I guess once you start getting some serious numbers, thing could get slow. An easy way to remedy this is to de-normalize and put a board reference on posts. I don't have much experience with very high scale queries, not sure how non-performant this could get. In vichan, each board is it's own table, so you have to do a query for each table to count IPs, which I suspect is less performant than a single query with a group by.
>t could make it vulnerable to DOS/refreshing.I'm using redis and I plan on using it to cache certain results, metrics being the obvious one, but if possible, I'd love to cache shit like "all posts in a thread". I'm very much making this shit up as I go lol, I don't have experience with scala besides this, so I don't even know if this is possible with the libraries I'm using.
>Inserting a new thread has a mutual dependenceYes! Good catch. I bit the bullet and made post's references to threads nullable. There might be situations in the future where a reference to a thread might not be necessary (like associating posts with bans?) so I opted to relax that constraint.
>hope you don't mind if other people steal it That would be very fulfilling. That's why I chose to use the PG Modeler, so that I could make nice pictures and others can use this work as reference/inspiration.
>autosage vs bumplock?autosage is when the thread reaches bumplimit. Here it's like 750 posts.
bumplocked is when a mod stops it from bumping, aka anchoring.
Update:Users can now create threads and posts (text only)
This means most (important) non-authed endpoints exist.
This shit feels insurmountable at times……..When posting, lainchan passes the request through at least these filters/functions, some optional and set in the configs, others obligatory:
- Validate captcha
- Validate bot obfuscation (field random data to confuse bots)
- Validate request has no referrer (block posts that contain HTTP_REFERER)
- Validate IP is not blacklisted by third party lists (aka DNSBL)
- Validate IP is not banned
- Validate IP is not robot muted (You get muted for x time if you post an identical message).
- Validate that thread exists
- Validate embed
- Rewrite fields if such a configuration is set (eg. force anonymous names so namefagging is not allowed)
- Perform upload by URL (ie you place a URL in the post and the server downloads the file for you)
- If no name is chosen, set name to one of the "anonymous" names. (in lainchan, you can set an array of names you wish to use instead of "anonymous").
- Validate that if the post is an OP it has an attachment.
- Validate post's body's minimum length.
- Validate that thread is not locked.
- Validate that the thread is not full (maximum number of posts in a thread, autolocks).
- Validate that the thread is not full in number of permitted file attachments.
- Validate the size of each file doesn't exceed max_size
- Validate the total size of each file doesn't exceed max_size
- Validate capcode if mod
- Validate joke capcode (setting to allow users to post as ## moot or ## hacker etc)
- Generate trip code
- Check for noko/nonoko in email field
- Validate filename extensions
- Generate filename using timestamp or a custom generator function
- Validate number of files doesn't exceed max_files
- Strip illegal characters from text fields
- Validate string lengths in each field (name, subject, etc)
- Apply wordfilters to body
- Apply country flags (using IP info)
- Apply user flags
- Apply markup
- Validate number of cites doesn't exceed max_cites (cites are >>{id})
- Track cites (validates each and stores them)
- Apply markup for cites
- Truncate filename to max_filename_length
- Calculate md5 checksum
- Calculate md5 of all checksums together.
- Check post for Flood (per IP) (eg limit posts by IP per minute)
- Check post for Flood (per file) (eg limit hash per minute)
- Check post for Flood (per body) (eg limit hashed body per minute)
- Check post using custom filters
- Apply extension functions (apply image transforms and create thumbnails (imagemagick+exif stuff), pdf, epub, txt, etc)
- If file has a thumbnail, OCR it and add it to the body (using hidden markup) to pass it through spam filters
- Validate image md5 is unique in board
- Validate image md5 is unique in thread
- If OCR was used, check flood and custom filters again lol
- Validate originality of body, mute if unoriginal (and config is set).
- Insert post into database
- Slugify post
- Insert post information to flood table
- If thread is cycled, delete all posts except last X posts and OP
- Update table of tracked cites
- bump thread (if not bumplocked, or full, or saged)
- handle noko
(async from here on)
- build thread's static HTML
- build board's static HTML
- rebuild any other themes that might be subscribed to the "post" or "post-thread" event (such as the homepage, overboards, and catalogs)
vichan/lainchan is incredibly configurable. Who gets to do what, filters, etc etc. I'm still thinking how to implement permissions. In lainchan, it's (mostly) done in the config files.
Eg:
$config['mod']['flood'] = MOD;
// Raw HTML posting
$config['mod']['rawhtml'] = ADMIN;
// View the report queue
$config['mod']['reports'] = JANITOR;
// Allow OP to remove arbitrary posts in his thread
$config['user_moderation'] = false;
I'd rather avoid that and have permissions be editable on some admin interface, plus shit like roles etc.
I was thinking of making everything be behind some permission framework, even if it slows down stuff. The default role would be "anonymous", so you could even make user-only boards. Say for example a secret /i/ board that is only available to those that have been given read permission to /i/.
With the coming of IPv6 and the increasing accessibility of VPNs, tor etc, it's becoming more and more difficult to ban spammers/wreckers. Ideally, sessions would also have "history". For example, if your session has completed a captcha in X minutes/hours, then you can post without captcha.
I don't know how to model this.
I'm thinking something like:
Board /leftypol/ settings:
Role: "EasyCaptchaSolver" Permission: "Post" + "Report"
Role: "MediumCaptchaSolver" Permission: … + "Post with image"
Role: "HardCaptchaSolver" Permission: … + "Create thread"
Role: "Janitor" Permission: … + "Soft delete posts/threads"
Role: "Mod" Permission: … + "Hard delete posts/threads" + "Ban"
Site settings:
Role: "HardCaptchaSolver" Permission: "Create User"
What is still TBD is how to implement this. I'm fine with storing this in the database and caching with redis, but I'm still not sure how to model the permissions. Apparently this is called a Role-based access control, pic related.
https://en.wikipedia.org/wiki/Role-based_access_control