-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to connect to database: Access denied for user 'mogilefs #1
Open
molele2
wants to merge
418
commits into
hachi:master
Choose a base branch
from
mogilefs:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We need to ensure we're sane when dealing with larger files requiring multiple reads.
This matches the buffer size used by replication, and showed a performance increase when timing the 100M large file test in t/40-httpfile.t With the following patch, I was able to note a ~46 -> ~27s time difference with both MD5 methods using this change to increase buffer sizes. --- a/t/40-httpfile.t +++ b/t/40-httpfile.t @@ -125,5 +125,12 @@ $expect = $expect->digest; @paths = $mogc->get_paths("largefile"); $file = MogileFS::HTTPFile->at($paths[0]); ok($size == $file->size, "big file size match $size"); +use Time::HiRes qw/tv_interval gettimeofday/; + +my $t0; +$t0 = [gettimeofday]; ok($file->md5_mgmt(sub {}) eq $expect, "md5_mgmt on big file"); +print "mgmt ", tv_interval($t0), "\n"; +$t0 = [gettimeofday]; ok($file->md5_http(sub {}) eq $expect, "md5_http on big file"); +print "http ", tv_interval($t0), "\n";
Base64 requires further escaping for our tracker protocol which gets ugly and confusing. It's also easier to interact/verify with existing command-line tools using hex.
We need a place to store mappings for various checksum types we'll support.
This is needed to wire up checksums to classes.
Digest::MD5 and Digest::SHA1 both support the same API for streaming data for the calculation, so we can validate our content as we stream it.
Checksum usage will be decided on a per-class basis.
This branch is now rebased against my latest clear_cache which allows allows much faster metadata updates for testing.
Helps me keep my head straight.
This can come in handy.
We'll use the "Digest" class in Perl as a guide for this. Only MD5 is officially supported. However, this *should* support SHA-(1|256|384|512) and it's easy to add more algorithms.
We can now: * enable checksums for classes * save client-provided checksums to the database * verify them on create_close * read them in file_info
we need to be able to both enable and disable checksuming for a class
This returns undef if a checksum is missing for a class, and a MogileFS::Checksum object if it exists.
replication now lazily generates checksums if they're not provided by the client (but required by the storage class). replication may also verify checksums if they're available in the database. replication now sets the Content-MD5 header on PUT requests, in case the remote server is capable of rejecting corrupt transfers based on it replication attempts to verify the checksum of the freshly PUT-ed file. TODO: monitor will attempt "test-write" with mangled Content-MD5 to determine if storage backends are Content-MD5-capable so replication can avoid reading checksum on destination
This functionality (and a server capable of rejecting bad MD5s) will allow us to skip an expensive MogileFS::HTTPFile->digest request at replication time. Also testing with the following patch to Perlbal: --- a/lib/mogdeps/Perlbal/ClientHTTP.pm +++ b/lib/mogdeps/Perlbal/ClientHTTP.pm @@ -22,6 +22,7 @@ use fields ('put_in_progress', # 1 when we're currently waiting for an async job 'content_length', # length of document being transferred 'content_length_remain', # bytes remaining to be read 'chunked_upload_state', # bool/obj: if processing a chunked upload, Perlbal::ChunkedUploadState object, else undef + 'md5_ctx', ); use HTTP::Date (); @@ -29,6 +30,7 @@ use File::Path; use Errno qw( EPIPE ); use POSIX qw( O_CREAT O_TRUNC O_WRONLY O_RDONLY ENOENT ); +use Digest::MD5; # class list of directories we know exist our (%VerifiedDirs); @@ -61,6 +63,7 @@ sub init { $self->{put_fh} = undef; $self->{put_pos} = 0; $self->{chunked_upload_state} = undef; + $self->{md5_ctx} = undef; } sub close { @@ -134,6 +137,8 @@ sub handle_put { return $self->send_response(403) unless $self->{service}->{enable_put}; + $self->{md5_ctx} = $hd->header('Content-MD5') ? Digest::MD5->new : undef; + return if $self->handle_put_chunked; # they want to put something, so let's setup and wait for more reads @@ -421,6 +426,8 @@ sub put_writeout { my $data = join("", map { $$_ } @{$self->{read_buf}}); my $count = length $data; + my $md5_ctx = $self->{md5_ctx}; + $md5_ctx->add($data) if $md5_ctx; # reset our input buffer $self->{read_buf} = []; @@ -460,6 +467,17 @@ sub put_close { if (CORE::close($self->{put_fh})) { $self->{put_fh} = undef; + + my $md5_ctx = $self->{md5_ctx}; + if ($md5_ctx) { + my $actual = $md5_ctx->b64digest; + my $expect = $self->{req_headers}->header("Content-MD5"); + $expect =~ s/=+\s*\z//; + if ($actual ne $expect) { + return $self->send_response(400, + "Content-MD5 mismatch, expected: $expect actual: $actual"); + } + } return $self->send_response(200); } else { return $self->system_error("Error saving file", "error in close: $!");
Rereading a large file is expensive. If we can monitor and observe our storage nodes for MD5 rejectionability, we can rely on that instead of having to have anybody reread the entire file to calculate its MD5.
Only the fsck part remains to be implemented... And I've never studied/used fsck much :x
Stale rows are bad.
TODO: see if we can use LWP to avoid mistakes like this :x
Fsck behavior is based on existing behavior for size mismatches. size failures take precedence, since it's much cheaper to verify size match/mismatches than checksum mismatches. While checksum calculations are expensive and fsck is already parallel, so we do not parallelize checksum calculations on a per-FID basis.
It reads more easily this way, at least to me.
I'll be testing checksum functionality on my home installation before testing it on other installations, and I run SQLite at home. ref: http://www.sqlite.org/lang_altertable.html
We need to ensure the worker stays alive during MD5 generation, especially on large files that can take many seconds to verify.
This special-cases "NONE" for no hash for our users.
We don't actually use the BLOB type anywhere, as checksums are definitely not "L"(arge) objects.
The timeout comparison is wrong and causing ping_cb to never fire. This went unnoticed since I have reasonably fast disks on my storage nodes and the <$sock> operation was able to complete before being hit by a watchdog timeout.
Enabling this setting allows fsck to checksum all replicas on all devices and report any corrupted copies regardless of per-class settings. This feature is useful for determining if enabling checksums on certain classes is necessary and will also benefit users who cannot afford to store checksums in the database.
MD5 is faster than SHA1, and much faster than any of the SHA2 variants. Given the time penalty of fsck is already high with MD5, prevent folks from shooting themselves in the foot with extremely expensive hash algorithms.
Unlike the setting it replaces, this new setting can be used to disable checksumming entirely, regardless of per-class options. fsck_checksum=(class|off|MD5) class - is the default, fsck based on per-class hashtype off - skip all checksumming regardless of per-class setting MD5 - same as the previous fsck_auto_checksum=MD5
This defines the size of the HTTP connection pool. This affects all workers at the moment, but is likely most interesting to the Monitor as it affects the number of devices the monitor may concurrently update. This defaults to 20 (the long-existing, hard-coded value). In the future, there may be a(n easy) way to specify this on a a per-worker basis, but for now it affects all workers.
Blindly attempting to write to a socket before a TCP connection can be established returns EAGAIN on Linux, but not on FreeBSD 8/9. This causes Danga::Socket to error out, as it won't attempt to buffer on anything but EAGAIN on write() attempts. Now, we buffer writes explicitly after the initial socket creation and connect(), and only call Danga::Socket::write when we've established writability. This works on Linux, too, and avoids an unnecessary syscall in most cases. Reported-by: Alex Yakovenko <[email protected]>
Otherwise we'll end up constantly waking up when there's nothing to write.
The timeout check may run on a socket before epoll_wait/kevent has a chance to run, giving the application no chance for any readiness callbacks to fire. This prevents timeouts in the monitor if the database is slow during synchronous UPDATE device calls (or there are just thousands of active connections).
HTTP requests time out because we had to wait synchronously for DBI, this is very noticeable on a high-latency connection. So avoid running synchronous code while asynchronous code (which is subject to timeouts) is running..
With enough devices and high enough network latency to the DB, we bump into the watchdog timeout of 30s easily.
Issuing many UPDATE statements slow down monitoring on high latency connections between the monitor and DB. Under MySQL, it is possible to do multiple UPDATEs in a single statement using CASE/WHEN syntax. We limit ourselves to 10000 devices per update for now, this should keep us comfortably under most the max_allowed_packet size of most MySQL deployments (where the default is 1M). A compatibility function is provided for SQLite and Postgres users. SQLite users are not expected to run this over high-latency NFS, and interested Postgres users should submit their own implementation.
mark_fidid_unreachable has not been used since MogileFS 2.35 commit 53528c7 ("Wipe out old replication code.", r1432)
No longer used since commit ebf8a5a ("Mass nuke unused code and fix most tests") in MogileFS 2.50
"is not unique" => "UNIQUE constraint failed". String matching is lovely.
Changelog diff is: diff --git a/CHANGES b/CHANGES index a6b2872..441b328 100644 --- a/CHANGES +++ b/CHANGES @@ -1,3 +1,29 @@ +2014-12-15: Release version 2.72 + + * Work with DBD::SQLite's latest lock errors (dormando <[email protected]>) + + * remove update_host_property (Eric Wong <[email protected]>) + + * remove users of unreachable_fids table (Eric Wong <[email protected]>) + + * monitor: batch MySQL device table updates (Eric Wong <[email protected]>) + + * monitor: defer DB updates until all HTTP requests are done (Eric Wong <[email protected]>) + + * connection/poolable: defer expiry of timed out connections (Eric Wong <[email protected]>) + + * connection/poolable: disable watch_write before retrying write (Eric Wong <[email protected]>) + + * connection/poolable: do not write before event_write (Eric Wong <[email protected]>) + + * add conn_pool_size configuration option (Eric Wong <[email protected]>) + + * enable TCP keepalives for iostat watcher sockets (Eric Wong <[email protected]>) + + * host: add "readonly" state to override device "alive" state (Eric Wong <[email protected]>) + + * add LICENSE file to distro (dormando <[email protected]>) + 2013-08-18: Release version 2.70 * This release features a very large rewrite to the Monitor worker to run
Due to a bug the MultipleNetworks replication policy <[email protected]>, a network split caused an instance to explode with overreplicated files. Since every too_happy pruning increases failcount, it could end up taking days due to clean up a file with far too many replicas.
The readonly host state was not enabled via mogdbsetup and could not be used although the code supports it, making the schema version bump to 16 a no-op. This bumps the schema version to 17. Add a test using mogadm to ensure the setting is changeable, as the existing test for this state did not rely on the database. This was also completely broken with Postgres before, as Postgres currently offers no way to modify constraints in-place. Constraints must be dropped and re-added instead. Note: it seems the upgrade_add_device_* functions in Postgres.pm are untested as well and never got used. Perhaps they ought to be removed entirely since those device columns predate Postgres support.
Perl buffered IO is only reading 8K at a time (or only 4K on older versions!) despite us requesting to read in 1MB chunks. This wastes syscalls and can affect TCP window scaling when MogileFS is replicating across long fat networks (LFN). While we're at it, this fixes a long-standing FIXME item to perform proper timeouts when reading headers as we're forced to do sysread instead of line-buffered I/O. ref: https://rt.perl.org/Public/Bug/Display.html?id=126403 (and confirmed by strace-ing replication workers)
* bogomips/fix-readonly: enable DB upgrade for host readonly state
* bogomips/fsck-recheck: fsck: this avoid redundant fsck log entries
* bogomips/fsck-found-order: fsck: do not log FOND if note_on_device croaks
* bogomips/prune-too_happy-v3: replicate: reduce backoff for too_happy FIDs
* bogomips/resurrect-device: reaper: detect resurrection of "dead" devices
Perl 5.18 stable and later (commit a7b39f85d7caac) introduced a warning for restarting `each` after hash modification. While we accounted for this undefined behavior and documented it in the past, this may still cause maintenance problems in the future despite our current workarounds being sufficient. In any case, keeping idle sockets around is cheap with modern APIs, and conn_pool_size was introduced in 2.72 to avoid dropping idle connections at all; so _conn_drop_idle may never be called on a properly configured tracker. Mailing list references: <CABJfL5jiAGC+5JzZjuW7R_NXs1DShHPGsKnjzXrPbjWOy2wi3g@mail.gmail.com> <[email protected]>
On *BSD platforms, the accept()-ed clients inherit the O_NONBLOCK file flag from the listen socket. This is not true on Linux, and I noticed sockets blocking on write() syscalls via strace. Checking the octal 04000 (O_NONBLOCK) flag in /proc/$PID/fdinfo/$FD for client TCP sockets confirms O_NONBLOCK was not set. This also makes us resilient to spurious wakeups causing event_read to get stuck, as documented in the Linux select(2) manpage.
Make client query processing less aggressive and more fair by only enqueueing a single worker request at a time. Pipelined requests in the read buffer will only be handled after successful writes, and any incomplete writes will block further request processing. Furthermore, add a watchdog for clients we're writing to expire clients which are not reading our responses. Danga::Socket allows clients to use an infinite amount of space for buffering, and it's possible for dead sockets to go undetected for hours by the OS. Use a watchdog to kick out any sockets which have made no forward progress after two minutes.
This avoids the odd case where the first write completes, but the second one (for 3 bytes: ".\r\n") does not complete, causing a client to having both read and write watchability enabled after the previous commit to stop reads when writes do not complete. This would not be fatal, but breaks the rule where clients should only be reading or writing exclusively, never doing both; as that could lead to pathological memory usage. This also reduces client wakeups and TCP overhead with TCP_NODELAY sockets by avoiding a small packet (".\r\n") after the main response.
Otherwise it'll be possible to pipeline admin (!) commands and event_read will trigger EOF before all the admin commands are processed in read_buf.
* client-backpressure: client: always disable watch_read after a command client: use single write for admin commands tracker: client fairness, backpressure, and expiry client connection should always be nonblocking
* bogomips/replicate-nobuf: replicate: avoid buffered IO on reads
* bogomips/conn-pool-each: ConnectionPool: avoid undefined behavior for hash iteration
If DevFID::size_on_disk encounters an unreadable (dead) device AND there are no HTTP requests pending; we must ensure Danga::Socket runs the PostLoopCallback to check if the event loop is complete. Do that by scheduling another timer to run immediately.
* fsck-timeout: fsck: avoid infinite wait on dead devices
Changelog diff is: diff --git a/CHANGES b/CHANGES index 441b328..e053851 100644 --- a/CHANGES +++ b/CHANGES @@ -1,3 +1,29 @@ +2018-01-18: Release version 2.73 + + * fsck: avoid infinite wait on dead devices (Eric Wong <[email protected]>) + + * client: always disable watch_read after a command (Eric Wong <[email protected]>) + + * client: use single write for admin commands (Eric Wong <[email protected]>) + + * tracker: client fairness, backpressure, and expiry (Eric Wong <[email protected]>) + + * client connection should always be nonblocking (Eric Wong <[email protected]>) + + * ConnectionPool: avoid undefined behavior for hash iteration (Eric Wong <[email protected]>) + + * replicate: avoid buffered IO on reads (Eric Wong <[email protected]>) + + * enable DB upgrade for host readonly state (Eric Wong <[email protected]>) + + * replicate: reduce backoff for too_happy FIDs (Eric Wong <[email protected]>) + + * fsck: this avoid redundant fsck log entries (Eric Wong <[email protected]>) + + * fsck: do not log FOND if note_on_device croaks (Eric Wong <[email protected]>) + + * reaper: detect resurrection of "dead" devices (Eric Wong <[email protected]>) + 2014-12-15: Release version 2.72 * Work with DBD::SQLite's latest lock errors (dormando <[email protected]>)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I try this:
1 step:
on mysql database:
mysql> create database mogilefs;
Query OK, 1 row affected (0.00 sec)
mysql> grant all on mogilefs.* to 'mogilefs'@'%' identified by 'mogilefs';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
2 step:
mogdbsetup --dbhost=192.168.2.161 --dbport=3306 --dbname=mogilefs --dbrootuser=root --dbrootpass=198456 --dbuser=mogilefs --dbpass=mogilefs
3 step:
su mogilefs
bash-4.1$ mogilefsd -c /etc/mogilefs/mogilefsd.conf --daemon
Failed to connect to database: Access denied for user 'mogilefs #数据库上妀'192.168.2.161' (using password: YES) at /usr/local/share/perl5/MogileFS/Store.pm line 388.
mogilefsd.conf
daemonize = 1
pidfile = /var/run/mogilefsd/mogilefsd.pid
db_dsn = DBI:mysql:mogilefs:host=192.168.2.161
db_user = mogilefs
db_pass = mogilefs
listen = 192.168.2.161:7001
conf_port = 7001
query_jobs = 10
delete_jobs = 1
replicate_jobs = 5
reaper_jobs = 1
[root@bcZmmpe81nZ90s MogileFS]# ls -ld /var/run/mogilefsd
drwxr-xr-x. 2 mogilefs mogilefs 4096 Sep 6 23:42 /var/run/mogilefsd